Re: [HACKERS] Enabling Checksums

2013-04-08 Thread Simon Riggs
On 6 April 2013 08:40, Heikki Linnakangas  wrote:


> AFAICS that could be easily avoided by doing a simple PageGetLSN() like we
> used to, if checksums are not enabled. In XLogCheckBuffer:
>
>  /*
>>  * XXX We assume page LSN is first data on *every* page that can
>> be passed
>>  * to XLogInsert, whether it otherwise has the standard page
>> layout or
>>  * not. We don't need the buffer header lock for PageGetLSN
>> because we
>>  * have exclusive lock on the page and/or the relation.
>>  */
>> *lsn = BufferGetLSNAtomic(rdata->**buffer);
>>
>
> Also, the second sentence in the above comment is completely bogus now.


Both points addressed on separate commits.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-08 Thread Simon Riggs
On 6 April 2013 15:44, Andres Freund  wrote:


> > * In xlog_redo, it seemed slightly awkward to call XLogRecGetData twice.
> > Merely a matter of preference but I thought I would mention it.
>
> Youre absolutely right, memcpy should have gotten passed 'data', not
> XLogRecGetData().


Applied, with this as the only code change.

Thanks everybody for good research and coding and fast testing.

We're in good shape now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch for removng unused targets

2013-04-08 Thread Etsuro Fujita
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]

> Alexander Korotkov  writes:
> > On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane  wrote:
> >> But having said that, I'm wondering (without having read the patch)
> >> why you need anything more than the existing "resjunk" field.
> 
> > Actually, I don't know all the cases when "resjunk" flag is set. Is it
> > reliable to decide target to be used only for "ORDER BY" if it's "resjunk"
> > and neither system or used in grouping? If it's so or there are some other
> > cases which are easy to determine then I'll remove "resorderbyonly" flag.
> 
> resjunk means that the target is not supposed to be output by the query.
> Since it's there at all, it's presumably referenced by ORDER BY or GROUP
> BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.
> 
> What you would need to do is verify that the target is resjunk and not
> used in any clause besides ORDER BY.  I have not read your patch, but
> I rather imagine that what you've got now is that the parser checks this
> and sets the new flag for consumption far downstream.  Why not just make
> the same check in the planner?

I've created a patch using this approach.  Please find attached the patch.

> A more invasive, but possibly cleaner in the long run, approach is to
> strip all resjunk targets from the query's tlist at the start of
> planning and only put them back if needed.
> 
> BTW, when I looked at this a couple years ago, it seemed like the major
> problem was that the planner assumes that all plans for the query should
> emit the same tlist, and thus that tlist eval cost isn't a
> distinguishing factor.  Breaking that assumption seemed to require
> rather significant refactoring.  I never found the time to try to
> actually do it.

Such an approach would improve code readability, but I'm not sure it's worth the
work for this optimization, though I think I'm missing something.

Thanks,

Best regards,
Etsuro Fujita


unused-targets-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-08 Thread Alexander Korotkov
On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > [ trgm-regexp-0.15.patch.gz ]
>
> I spent the weekend hacking on this, making a number of bug fixes and a
> whole lot of cosmetic changes.  I think there are large parts of this
> that are in committable shape now, but I still find the actual graph
> transformation logic to be mostly unintelligible.  I think what's most
> obscure is the distinction between the arcs list and the keys list of
> each state in the expanded graph.  I get the impression that the
> general idea is for the arcs to represent exactly-known transitions
> while the keys represent imprecisely-known transitions ... but there
> seems to be at least some leakage between those categories.  Could
> you write down a specification for what's supposed to be happening
> there?
>

Here is my try to specify it.
At first some notions. I know, they are already in the comments, but just
in order to put it together.

Extended color - any color of source CNFA or one of two special values:
1) Unknown color - may represent any character either alphanumeric or
non-alphanumeric.
2) Blank color - may represent any non-alphanumeric character
Prefix is extended colors of last two characters read by CNFA.
Key is pair of CNFA state and prefix. So, key is a extended state which is
containing additional information which can influence further trigrams.

So, if you are in some key and traverse some CNFA arc then you moves info
another key. But there are two possible cases (or, sometimes, both of them):
1) Your move from one key into another necessary means read of some
trigram. Then you create new arc labeled with that trigram.
2) You move into another key, but you doesn't necessary read an useful
trigram. For example, arc of source CNFA is labeled by "unextractable"
color. Then you add new key into "keys" array. And outgoing arcs from this
key will also be processed similarly to source key. Therefore "keys" array
is a set of keys which are achievable from "stateKey" without reading of
useful trigram.
We could get rid of "keys" array and produce some "empty arcs" in the
second case. You can imagine that "keys" array is set of keys which are
achivable by "empty arcs" from "stateKey". States connected with "empty
arcs" could be merged on the next stage.
However, the reason of having separated addKeys stage is optimization. In
addKey function more generals keys absorb less general ones. In many cases
resulting graph becomes much simplier because of this. For example, if your
regex is not prefix, then engine puts self-referencing arcs of all possible
colors to initial state. Straight-forward processing of this could produce
enormous output graph. I had similar situation in early version of patch
where keys didn't absorb earch other.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-08 Thread Heikki Linnakangas

On 03.04.2013 22:50, Jeff Janes wrote:

On Wed, Apr 3, 2013 at 11:14 AM, Heikki Linnakangas
wrote:



On 03.04.2013 18:58, Jeff Janes wrote:


On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes   wrote:

  This commit introduced a problem with wal_keep_segments:


commit dfda6ebaec6763090fb78b458a979b**558c50b39b



The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing.  However, I
don't really understand KeepLogSeg.  It seems like segno, and hence
recptr,
don't actually serve any purpose.



Hmm, the check is actually correct, but the assignment in the else-branch
isn't. The idea of KeepLogSeg is to calculate recptr - wal_keep_segments,
and assign that to *logSegNo. But only if *logSegNo is not already<  than
the calculated value. Does the attached look correct to you?



Let me describe what I think is going on.  My description is "On start,
recptr is the redo location of the just-completed checkpoint, and logSegNo
is the redo location segment of the checkpoint before that one.  We want to
keep the previous-checkpoint redo location, and we also want to keep
wal_keep_segments before the current-checkpoint redo location, so we take
whichever is earlier."

If my understanding is now correct, then I think your patch looks correct.
  (Also, applying it fixed the problem I was having.)


Ok, thanks, applied.


Why do we keep wal_keep_segments before the just-finished checkpoint,
rather than keeping that many before the previous checkpoint?  I seems like
it would be more intuitive (to the DBA) for that parameter to mean "keep
this many more segments than you otherwise would".  I'm not proposing we
change it, I'm just curious about why it is done that way.


It feels more intuitive to me the way it is. wal_keep_log_segments means 
"make sure there are always this many old WAL segments available in the 
server, regardless of any other settings". If you have a standby, it 
means that you don't need a new base backup as long as you don't fall 
behind the master by more than wal_keep_segments segments.


On 03.04.2013 21:33, Alvaro Herrera wrote:

"by by"


Fixed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-08 Thread Brendan Jurd
On 8 April 2013 16:09, Tom Lane  wrote:
> Brendan Jurd  writes:
>> On the specific issue of CARDINALITY, I guess we need to decide
>> whether we are going to pretend that our array/matrix thing is
>> actually nested.  I first argued that we should not.   But it occurred
>> to me that if we do pretend, it would at least leave the door ajar if
>> we want to do something to make our arrays more nest-like in future,
>> without disrupting the behaviour of CARDINALITY.
>
> This seems to be exactly the same uncertainty that we couldn't resolve
> back in the 8.4 devel cycle, for exactly the same reasons.  I don't see
> that the discussion has moved forward any :-(
>

I had a poke around in the archives, and it seems to me that the major
argument that was advanced in favour of making cardinality() return
the total number of items was ... we don't have anything that does
that yet.  That's why I'm proposing we add array_num_items as well --
I do think there should be a function for this, I just don't think
cardinality fits the bill.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Tom Lane
Rodrigo Barboza  writes:
> UPDATE tm32 SET a = a + 1 WHERE a > $i;
> ERROR: unsupported type: 202886

I'm betting that's coming from scalargtsel, which doesn't know anything
about your type, but you've nominated it to be the selectivity function
for ">" anyway.

/*
 * Can't get here unless someone tries to use scalarltsel/scalargtsel on
 * an operator with one numeric and one non-numeric operand.
 */
elog(ERROR, "unsupported type: %u", typid);

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Rodrigo Barboza
On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane  wrote:

> Rodrigo Barboza  writes:
> > UPDATE tm32 SET a = a + 1 WHERE a > $i;
> > ERROR: unsupported type: 202886
>
> I'm betting that's coming from scalargtsel, which doesn't know anything
> about your type, but you've nominated it to be the selectivity function
> for ">" anyway.
>
> /*
>  * Can't get here unless someone tries to use
> scalarltsel/scalargtsel on
>  * an operator with one numeric and one non-numeric operand.
>  */
> elog(ERROR, "unsupported type: %u", typid);
>
> regards, tom lane
>


Yes, I found it in the code, but I followed the example from the postgres
documentation that uses this function.
And why does it work sometimes? Why not other times?


Re: [HACKERS] [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi

2013-04-08 Thread Tom Lane
Dimitri Fontaine  writes:
> Tom Lane  writes:
>> If there is anybody still using Postgres on machines without wcstombs() or
>> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
>> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
>> incorrectly. It seems likely that there are no such installations, though.

> Those conditions seem just complex enough to require a test script that
> will check that for you. What if we created a new binary responsible for
> auto checking all those release-note items that are possible to machine
> check, then issue a WARNING containing the URL to the release notes you
> should be reading, and a SQL script (ala pg_upgrade) to run after
> upgrade?

How exactly would you know whether the previous installation was built
without HAVE_WCSTOMBS/HAVE_TOWLOWER?  That's not exposed anywhere
reliable.  And it's not out of the question that somebody upgrading to
a newer PG version might upgrade his OS too, so I would not think that
checking what configure says now is trustworthy.

In general, though, this suggestion seems about two orders of magnitude
more work than the particular case justifies.  We might want to think
about something like it for future releases, but it's not likely to
happen for 9.3.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

2013-04-08 Thread Greg Stark
On Thu, Apr 4, 2013 at 4:10 PM, Merlin Moncure  wrote:

> The only reasonable answer for this (a provably used, non-security,
> non-standards violating, non-gross functionality breakage case) is
> *zero*.  Our historically cavalier attitude towards compatibility
> breakage has been an immense disservice to our users and encourages
> very bad upgrade habits and is, IMNSHO, embarrassing.
>

The flip side of this is that the ability to make improvements freely is
one of the biggest strengths of free software over commercial software.
Oracle and Microsoft invest *tons* of money in maintaining huge libraries
of backward compatibility code and drastically limits their ability to keep
making improvements.


-- 
greg


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-04-08 Thread Dimitri Fontaine
Joe Conway  writes:
> Shouldn't that CREATE RULE be implicitly part of the CREATE EXTENSION?

Yes. It's a bug, been reported before, it's on my todo list. I have
arranged some time to care about it while in beta, I won't be able to
have at it before then…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Rodrigo Barboza
On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza wrote:

>
>
> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane  wrote:
>
>> Rodrigo Barboza  writes:
>> > UPDATE tm32 SET a = a + 1 WHERE a > $i;
>> > ERROR: unsupported type: 202886
>>
>> I'm betting that's coming from scalargtsel, which doesn't know anything
>> about your type, but you've nominated it to be the selectivity function
>> for ">" anyway.
>>
>> /*
>>  * Can't get here unless someone tries to use
>> scalarltsel/scalargtsel on
>>  * an operator with one numeric and one non-numeric operand.
>>  */
>> elog(ERROR, "unsupported type: %u", typid);
>>
>> regards, tom lane
>>
>
>
> Yes, I found it in the code, but I followed the example from the postgres
> documentation that uses this function.
>  And why does it work sometimes? Why not other times?
>
>

Here is a very simple case and weird behavior. I select * from a table and
returns 4 entries.
But when I run with a filter the error raises and crazy values are printed
from the params.

Here is my funcitons where I compare the values:

typedef uint32_t TmUInt32;

static int
tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b)
{
int ret;
elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
if (a < b) ret = -1;
else if (a > b) ret = 1;
else ret = 0;
elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_*,
ret, a);
return ret;
}

PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt);

Datum
tmuint32_int32_abs_gt(PG_FUNCTION_ARGS)
{
TmUInt32 *param1;
int32_t param2;
elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);

if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL();

param1 = (TmUInt32 *) PG_GETARG_POINTER(0);
param2 = DatumGetInt32(PG_GETARG_DATUM(1));

elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*, *
_LINE_*, *param1, param2);
PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0);
}


And here is the simple test.

-- SIMPLE QUERY
select * from a;

NOTICE:  funcao:tmuint32_out linha:191

NOTICE:  funcao:tmuint32_out linha:191

NOTICE:  funcao:tmuint32_out linha:191

NOTICE:  funcao:tmuint32_out linha:191

 a
---
 0
 1
 2
 3
(4 rows)


_

-- QUERY WHITH FILTER
select * from a where a > 1;

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296

NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742

NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1

ERROR:  unsupported type: 220200


Re: [HACKERS] [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi

2013-04-08 Thread Dimitri Fontaine
Tom Lane  writes:
> How exactly would you know whether the previous installation was built
> without HAVE_WCSTOMBS/HAVE_TOWLOWER?  That's not exposed anywhere
> reliable.  And it's not out of the question that somebody upgrading to
> a newer PG version might upgrade his OS too, so I would not think that
> checking what configure says now is trustworthy.

Oh, it's even worse than I imagined then. If you don't know where to
find the information, maybe the release notes should just propose to
REINDEX in any case?

> In general, though, this suggestion seems about two orders of magnitude
> more work than the particular case justifies.  We might want to think
> about something like it for future releases, but it's not likely to
> happen for 9.3.

Yeah, I've been thinking that it's the case after sending the email
proposal. It might just be some accumulative effect that leads me to
that proposal. I still think that a tool able to asses if you're
concerned by "reindex gist indexes on that datatype" or suchlike would
be really good to have.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-04-08 Thread Joe Conway
On 04/08/2013 07:42 AM, Dimitri Fontaine wrote:
> Joe Conway  writes:
>> Shouldn't that CREATE RULE be implicitly part of the CREATE EXTENSION?
> 
> Yes. It's a bug, been reported before, it's on my todo list. I have
> arranged some time to care about it while in beta, I won't be able to
> have at it before then…

OK, maybe I'll try to take a look in the meantime.

Did you have any comment on the other pg_dump patch (reviewed by Vibhor)?

Thanks,

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Rodrigo Barboza
On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza wrote:

>
>
>
> On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza 
> wrote:
>
>>
>>
>> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane  wrote:
>>
>>> Rodrigo Barboza  writes:
>>> > UPDATE tm32 SET a = a + 1 WHERE a > $i;
>>> > ERROR: unsupported type: 202886
>>>
>>> I'm betting that's coming from scalargtsel, which doesn't know anything
>>> about your type, but you've nominated it to be the selectivity function
>>> for ">" anyway.
>>>
>>> /*
>>>  * Can't get here unless someone tries to use
>>> scalarltsel/scalargtsel on
>>>  * an operator with one numeric and one non-numeric operand.
>>>  */
>>> elog(ERROR, "unsupported type: %u", typid);
>>>
>>> regards, tom lane
>>>
>>
>>
>> Yes, I found it in the code, but I followed the example from the postgres
>> documentation that uses this function.
>>  And why does it work sometimes? Why not other times?
>>
>>
>
> Here is a very simple case and weird behavior. I select * from a table and
> returns 4 entries.
> But when I run with a filter the error raises and crazy values are printed
> from the params.
>
> Here is my funcitons where I compare the values:
>
> typedef uint32_t TmUInt32;
>
> static int
> tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b)
> {
> int ret;
> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
> if (a < b) ret = -1;
> else if (a > b) ret = 1;
> else ret = 0;
> elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_*,
> ret, a);
> return ret;
> }
>
> PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt);
>
> Datum
> tmuint32_int32_abs_gt(PG_FUNCTION_ARGS)
> {
> TmUInt32 *param1;
> int32_t param2;
> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
>
> if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL();
>
> param1 = (TmUInt32 *) PG_GETARG_POINTER(0);
> param2 = DatumGetInt32(PG_GETARG_DATUM(1));
>
> elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*,
> *_LINE_*, *param1, param2);
> PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0);
> }
>
>
> And here is the simple test.
>
> -- SIMPLE QUERY
> select * from a;
>
> NOTICE:  funcao:tmuint32_out linha:191
>
> NOTICE:  funcao:tmuint32_out linha:191
>
> NOTICE:  funcao:tmuint32_out linha:191
>
> NOTICE:  funcao:tmuint32_out linha:191
>
>  a
> ---
>  0
>  1
>  2
>  3
> (4 rows)
>
>
> _
>
> -- QUERY WHITH FILTER
> select * from a where a > 1;
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>
> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>
> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1
>
> ERROR:  unsupported type: 220200
>
>

I found that the problem is in the highlithed line. I'm getting the wrong
value from param1. But why this behavior?

PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt);

Datum
tmuint32_int32_abs_gt(PG_FUNCTION_ARGS)
{
TmUInt32 *param1;
int32_t param2;
elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *

Re: [HACKERS] pg_dump with postgis extension dumps rules separately

2013-04-08 Thread Dimitri Fontaine
Joe Conway  writes:
> OK, maybe I'll try to take a look in the meantime.

That would be awesome :)

> Did you have any comment on the other pg_dump patch (reviewed by Vibhor)?

This whole extension table filtering and dumping is more in Tom's realm,
so I guess that if you want to have another pair of eyes on your patch
before commit'ing it yourself, you will need him to have a look.

That said, I didn't spot anything obvious that I want to report myself,
so I would label it "ready for commiter".

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unused code in float8_to_char , formatting.c ?

2013-04-08 Thread Greg Jaskiewicz

On 7 Apr 2013, at 05:14, Robert Haas  wrote:

> On Thu, Apr 4, 2013 at 6:47 PM, Greg Jaskiewicz  wrote:
>> Looking around the code Today, one of my helpful tools detected this dead 
>> code.
>> As far as I can see, it is actually unused call to strlen() in formatting.c, 
>> float8_to_char().
> 
> I poked at this a little and suggest the following somewhat more
> extensive cleanup.
> 
> It seems to me that there are a bunch of these functions where len is
> unconditionally initialized in NUM_TOCHAR_prepare and then used there.
> Similarly in NUM_TOCHAR_cleanup.  And then there's a chunk of each
> individual function that does it a third time.  Rather than use the
> same variable in all three places, I've moved the variable
> declarations to the innermost possible scope.  Doing that revealed a
> bunch of other, similar places where we can get rid of strlen() calls.
> 
> Does this version seem like a good idea?


Looks more extensive :-)

On the quick glance, without a lot of testing it looks ok. 

But the lack of test cases stressing all different cases in that file, makes it 
impossible to say that there's no regressions.  
Personally I always feel uneasy making extensive changes in complicated code 
like this, without any integrated test case(s). 


-- 
GJ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Samrat Revagade
Hello,

We have been trying to figure out possible solutions to the following
problem in streaming replication Consider following scenario:

If master receives commit command, it writes and flushes commit WAL records
to the disk, It also writes and flushes data page related to this
transaction.

The master then sends WAL records to standby up to the commit WAL record.
But before sending these records if failover happens then,  old master is
ahead of  standby which is now the new master in terms of DB data leading
to inconsistent data .



One solution to avoid this situation is have the master send WAL records to
standby and wait for ACK from standby committing WAL files to disk and only
after that commit data page related to this transaction on master.

The main drawback would be increased wait time for the client due to extra
round trip to standby before master sends ACK to client. Are there any
other issues with this approach?


Thank you,

Samrat


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Shaun Thomas

On 04/08/2013 05:34 AM, Samrat Revagade wrote:


One solution to avoid this situation is have the master send WAL
records to standby and wait for ACK from standby committing WAL files
to disk and only after that commit data page related to this
transaction on master.


Isn't this basically what synchronous replication does in PG 9.1+?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Tom Lane
Samrat Revagade  writes:
> We have been trying to figure out possible solutions to the following
> problem in streaming replication Consider following scenario:

> If master receives commit command, it writes and flushes commit WAL records
> to the disk, It also writes and flushes data page related to this
> transaction.

> The master then sends WAL records to standby up to the commit WAL record.
> But before sending these records if failover happens then,  old master is
> ahead of  standby which is now the new master in terms of DB data leading
> to inconsistent data .

I don't exactly see the problem ... unless you're imagining that master
and slave share the same data storage or something like that.  That's
not going to work for a ton of reasons besides this one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog

2013-04-08 Thread Jeff Bohmer

On Apr 6, 2013, at 1:24 PM, Jeff Janes  wrote:

> On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas
> wrote:
> 
>> 
>> Incidentally, I bumped into another custom backup script just a few weeks
>> back that also excluded backup_label. I don't know what the author was
>> thinking when he wrote that, but it seems to be a surprisingly common
>> mistake. Maybe it's the "label" in the filename that makes people think
>> it's not important.
> 
> 
> 
> I think part of it is the name "label', and part of it is that this file is
> similar to and hence easily confused with the .history files, which (as far
> as I know) truly are there only for human information and not for system
> operation.

While the backup_label file was included in all base backups by the custom 
backup script, it was not present in the cluster directory when starting PG 
9.2.4. Because the data directory was synced from the base backup without 
backup_label. Sure enough, including backup_label in the data directory fixes 
this for me.

I think these custom scripts were written around 8.1. That the base backups 
have worked all this time (each base backup is tested) is probably because the 
backup script, immediately after calling pg_start_backup(), rsyncs pg_control 
before rsync'ing everything else in the data directory. Although, it seems this 
quirk allows for problems if WAL files are generated between the time 
pg_start_backup() is called and pg_control is rsync'd.


>> Perhaps we should improve the documentation to make it more explicit that
>> backup_label must be included in the backup. The docs already say that,
>> though, so I suspect that people making this mistake have not read the docs
>> very carefully anyway.
>> 
> 
> 
> I don't think the docs are very clear on that.  They say "This file will of
> course be archived as a part of your backup dump file", but "will be" does
> not imply "must be".  Elsewhere it emphasizes that the label you gave to
> pg_start_backup is written into the file, but doesn't really say what the
> file itself is there for.  To me it seems to imply that the file is there
> for your convenience, to hold that label, and not as a critical part of the
> system.
> 
> Patch attached, which I hope can be back-patched.  I'll also add it to
> commitfest-Next.
> 
> Cheers,
> 
> Jeff
> 


I think this documentation update would be helpful.

Thank you for your help,
- Jeff
--
Jeff Bohmer | Corporate Technology Manager | VisionLink, Inc.
First National Center | 3101 Iris Avenue, Suite 240 | Boulder CO, 80301

Office 303.402.0170 x121

Other ways to stay in touch - Blog | Twitter | Facebook | LinkedIn | Web

This message and any attachments may contain information that is privileged, 
confidential or exempt from disclosure under applicable law or agreement. If 
you 
have received this message in error, please reply and delete the message and 
any attachments without opening the attachment. Thank you.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Ants Aasma
On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas  wrote:
> On 04/08/2013 05:34 AM, Samrat Revagade wrote:
>
>> One solution to avoid this situation is have the master send WAL
>> records to standby and wait for ACK from standby committing WAL files
>> to disk and only after that commit data page related to this
>> transaction on master.
>
>
> Isn't this basically what synchronous replication does in PG 9.1+?

Not exactly. Sync-rep ensures that commit success is not sent to the
client before a synchronous replica acks the commit record. What
Samrat is proposing here is that WAL is not flushed to the OS before
it is acked by a synchronous replica so recovery won't go past the
timeline change made in failover, making it necessary to take a new
base backup to resync with the new master. I seem to remember this
being discussed when sync rep was committed. I don't recall if the
idea was discarded only on performance grounds or whether there were
other issues too.

Thinking about it now it, the requirement is that after crash and
failover to a sync replica we should be able to reuse the datadir to
replicate from the new master without consistency. We should be able
to achieve that by ensuring that we don't write out pages until we
have received an ack from the sync replica and that we check for
possible timeline switches before recovering local WAL. For the first,
it seems to me that it should be enough to rework the updating of
XlogCtl->LogwrtResult.Flush so it accounts for the sync replica. For
the second part, I think Heikkis work on enabling timeline switches
over streaming connections already ensure this (I haven't checked it
out in detail), but if not, shouldn't be too hard to add.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-08 Thread Kohei KaiGai
2013/4/5 Robert Haas :
> On Thu, Apr 4, 2013 at 8:26 AM, Kohei KaiGai  wrote:
>> OK, I follow the manner of the terminology as we usually call it.
>> The attached patch just replaced things you suggested.
>
> Thanks, I have committed this, after making some changes to the
> comments and documentation.  Please review the changes and let me know
> if you see any mistakes.
>
Thanks. I could find two obvious wording stuffs here, please see smaller
one of the attached patches. I didn't fixup manner to use "XXX" in source
code comments.

Also, the attached function-execute-permission patch is a rebased
version. I rethought its event name should be OAT_FUNCTION_EXECUTE,
rather than OAT_FUNCTION_EXEC according to the manner without
abbreviation. Other portion is same as previous ones.

> BTW, if it were possible to set things up so that the test_sepgsql
> script could validate the version of the sepgsql-regtest policy
> installed, that would eliminate a certain category of errors.  I
> notice also that the regression tests themselves seem to fail if you
> invoke the script as contrib/sepgsql/test_sepgsql rather than
> ./test_sepgsql, which suggests another possible pre-validation step.
>
Please see the test-script-fixup patch.
I added "cd `dirname $0`" on top of the script. It makes pg_regress to
avoid this troubles. Probably, pg_regress was unavailable to read
sql commands to run.

A problem regarding to validation of sepgsql-regtest policy module
is originated by semodule commands that takes root privilege to
list up installed policy modules. So, I avoided to use this command
in the test_sepgsql script.
However, I have an idea that does not raise script fail even if "sudo
semodule -l" returned an error, except for a case when it can run
correctly and the policy version is not expected one.
How about your opinion for this check?

Thanks,
-- 
KaiGai Kohei 


pgsql-v9.3-obvious-wording-fixes.patch
Description: Binary data


sepgsql-v9.3-test-script-fixup.v1.patch
Description: Binary data


sepgsql-v9.3-function-execute-permission.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Andres Freund
On 2013-04-08 19:26:33 +0300, Ants Aasma wrote:
> On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas  wrote:
> > On 04/08/2013 05:34 AM, Samrat Revagade wrote:
> >
> >> One solution to avoid this situation is have the master send WAL
> >> records to standby and wait for ACK from standby committing WAL files
> >> to disk and only after that commit data page related to this
> >> transaction on master.
> >
> >
> > Isn't this basically what synchronous replication does in PG 9.1+?
> 
> Not exactly. Sync-rep ensures that commit success is not sent to the
> client before a synchronous replica acks the commit record. What
> Samrat is proposing here is that WAL is not flushed to the OS before
> it is acked by a synchronous replica so recovery won't go past the
> timeline change made in failover, making it necessary to take a new
> base backup to resync with the new master. I seem to remember this
> being discussed when sync rep was committed. I don't recall if the
> idea was discarded only on performance grounds or whether there were
> other issues too.

Thats not going to work for a fair number of reasons:
* wal is streamed *from disk* not from memory
* what if the local node crashes/restarts immediately? Then the standby
  is farther ahead than the master.
* the performance implications of never writing data before flushing it
  are pretty severe
* ...

So this doesn't seem to solve anything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Ants Aasma
On Mon, Apr 8, 2013 at 7:38 PM, Andres Freund  wrote:
> On 2013-04-08 19:26:33 +0300, Ants Aasma wrote:
>> Not exactly. Sync-rep ensures that commit success is not sent to the
>> client before a synchronous replica acks the commit record. What
>> Samrat is proposing here is that WAL is not flushed to the OS before
>> it is acked by a synchronous replica so recovery won't go past the
>> timeline change made in failover, making it necessary to take a new
>> base backup to resync with the new master. I seem to remember this
>> being discussed when sync rep was committed. I don't recall if the
>> idea was discarded only on performance grounds or whether there were
>> other issues too.
>
> Thats not going to work for a fair number of reasons:
> * wal is streamed *from disk* not from memory

Yeah, this one alone makes the do-not-flush-before-replicating
approach impractical.

> * what if the local node crashes/restarts immediately? Then the standby
>   is farther ahead than the master.
> * the performance implications of never writing data before flushing it
>   are pretty severe
> * ...
>
> So this doesn't seem to solve anything.

Yeah, delaying WAL writes until replication is successful seems
impractical, but I don't see why we couldn't optionally take into
account walsender write pointers when considering if we can write out
a page. Sure there will be some performance hit for waiting to
replicate WAL, but on the other hand having to rsync a huge database
isn't too good for performance either.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Fujii Masao
On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade
 wrote:
>
> Hello,
>
> We have been trying to figure out possible solutions to the following problem 
> in streaming replication Consider following scenario:
>
> If master receives commit command, it writes and flushes commit WAL records 
> to the disk, It also writes and flushes data page related to this transaction.
>
> The master then sends WAL records to standby up to the commit WAL record. But 
> before sending these records if failover happens then,  old master is ahead 
> of  standby which is now the new master in terms of DB data leading to 
> inconsistent data .

Why do you think that the inconsistent data after failover happens is
problem? Because
it's one of the reasons why a fresh base backup is required when
starting old master as
new standby? If yes, I agree with you. I've often heard the complaints
about a backup
when restarting new standby. That's really big problem.

The timeline mismatch after failover was one of the reasons why a
backup is required.
But, thanks to Heikki's recent work, that's solved, i.e., the timeline
mismatch would be
automatically resolved when starting replication in 9.3. So, the
remaining problem is an
inconsistent database.

> One solution to avoid this situation is have the master send WAL records to 
> standby and wait for ACK from standby committing WAL files to disk and only 
> after that commit data page related to this transaction on master.

You mean to make the master wait the data page write until WAL has been not only
flushed to disk but also replicated to the standby?

> The main drawback would be increased wait time for the client due to extra 
> round trip to standby before master sends ACK to client. Are there any other 
> issues with this approach?

I think that you can introduce GUC specifying whether this extra check
is required to
avoid a backup when failback.

Regards,

--
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Rodrigo Barboza
On Mon, Apr 8, 2013 at 12:14 PM, Rodrigo Barboza wrote:

>
>
>
> On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza 
> wrote:
>
>>
>>
>>
>> On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza > > wrote:
>>
>>>
>>>
>>> On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane  wrote:
>>>
 Rodrigo Barboza  writes:
 > UPDATE tm32 SET a = a + 1 WHERE a > $i;
 > ERROR: unsupported type: 202886

 I'm betting that's coming from scalargtsel, which doesn't know anything
 about your type, but you've nominated it to be the selectivity function
 for ">" anyway.

 /*
  * Can't get here unless someone tries to use
 scalarltsel/scalargtsel on
  * an operator with one numeric and one non-numeric operand.
  */
 elog(ERROR, "unsupported type: %u", typid);

 regards, tom lane

>>>
>>>
>>> Yes, I found it in the code, but I followed the example from the
>>> postgres documentation that uses this function.
>>>  And why does it work sometimes? Why not other times?
>>>
>>>
>>
>> Here is a very simple case and weird behavior. I select * from a table
>> and returns 4 entries.
>> But when I run with a filter the error raises and crazy values are
>> printed from the params.
>>
>> Here is my funcitons where I compare the values:
>>
>> typedef uint32_t TmUInt32;
>>
>> static int
>> tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b)
>> {
>> int ret;
>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
>> if (a < b) ret = -1;
>> else if (a > b) ret = 1;
>> else ret = 0;
>> elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *_LINE_
>> *, ret, a);
>> return ret;
>> }
>>
>> PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt);
>>
>> Datum
>> tmuint32_int32_abs_gt(PG_FUNCTION_ARGS)
>> {
>> TmUInt32 *param1;
>> int32_t param2;
>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
>>
>> if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL();
>>
>> param1 = (TmUInt32 *) PG_GETARG_POINTER(0);
>> param2 = DatumGetInt32(PG_GETARG_DATUM(1));
>>
>> elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*,
>> *_LINE_*, *param1, param2);
>> PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0);
>> }
>>
>>
>> And here is the simple test.
>>
>> -- SIMPLE QUERY
>> select * from a;
>>
>> NOTICE:  funcao:tmuint32_out linha:191
>>
>> NOTICE:  funcao:tmuint32_out linha:191
>>
>> NOTICE:  funcao:tmuint32_out linha:191
>>
>> NOTICE:  funcao:tmuint32_out linha:191
>>
>>  a
>> ---
>>  0
>>  1
>>  2
>>  3
>> (4 rows)
>>
>>
>> _
>>
>> -- QUERY WHITH FILTER
>> select * from a where a > 1;
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>
>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 1, param2: 1
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>
>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 0 a: 1
>>
>> ERROR:  unsupported type: 220200
>>
>>
>
> I found that the problem is in the highlithed line. 

Re: [HACKERS] Back branches vs. gcc 4.8.0

2013-04-08 Thread Peter Eisentraut
On 4/5/13 6:14 PM, Tom Lane wrote:
> Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
> I think we need to do something to prevent it from breaking 8.4.x and
> 9.0.x.  It looks like our choices are (1) teach configure to enable
> -fno-aggressive-loop-optimizations if the compiler recognizes it,
> or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.
> 
> I'm a bit leaning towards (1), mainly because I'm not excited about
> fighting a compiler arms race in the back branches.

At the moment, I wouldn't do anything.  At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work.  Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions.  These things come and go, but releases live a long time.  How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run?  I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv.  Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back branches vs. gcc 4.8.0

2013-04-08 Thread Gavin Flower

On 09/04/13 08:41, Peter Eisentraut wrote:

On 4/5/13 6:14 PM, Tom Lane wrote:

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x.  It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

At the moment, I wouldn't do anything.  At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work.  Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions.  These things come and go, but releases live a long time.  How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run?  I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv.  Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)




-fno-break-my-code



Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Rodrigo Barboza
On Mon, Apr 8, 2013 at 4:30 PM, Rodrigo Barboza wrote:

>
>
>
>
> On Mon, Apr 8, 2013 at 12:14 PM, Rodrigo Barboza 
> wrote:
>
>>
>>
>>
>> On Mon, Apr 8, 2013 at 11:44 AM, Rodrigo Barboza > > wrote:
>>
>>>
>>>
>>>
>>> On Mon, Apr 8, 2013 at 11:27 AM, Rodrigo Barboza <
>>> rodrigombu...@gmail.com> wrote:
>>>


 On Mon, Apr 8, 2013 at 11:25 AM, Tom Lane  wrote:

> Rodrigo Barboza  writes:
> > UPDATE tm32 SET a = a + 1 WHERE a > $i;
> > ERROR: unsupported type: 202886
>
> I'm betting that's coming from scalargtsel, which doesn't know anything
> about your type, but you've nominated it to be the selectivity function
> for ">" anyway.
>
> /*
>  * Can't get here unless someone tries to use
> scalarltsel/scalargtsel on
>  * an operator with one numeric and one non-numeric operand.
>  */
> elog(ERROR, "unsupported type: %u", typid);
>
> regards, tom lane
>


 Yes, I found it in the code, but I followed the example from the
 postgres documentation that uses this function.
  And why does it work sometimes? Why not other times?


>>>
>>> Here is a very simple case and weird behavior. I select * from a table
>>> and returns 4 entries.
>>> But when I run with a filter the error raises and crazy values are
>>> printed from the params.
>>>
>>> Here is my funcitons where I compare the values:
>>>
>>> typedef uint32_t TmUInt32;
>>>
>>> static int
>>> tmuint32_int32_abs_cmp_internal(TmUInt32 a, int32_t b)
>>> {
>>> int ret;
>>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
>>> if (a < b) ret = -1;
>>> else if (a > b) ret = 1;
>>> else ret = 0;
>>> elog(NOTICE, "funcao:%s linha:%d, ret: %d a: %u\n", *_FUNCTION_*, *
>>> _LINE_*, ret, a);
>>> return ret;
>>> }
>>>
>>> PG_FUNCTION_INFO_V1(tmuint32_int32_abs_gt);
>>>
>>> Datum
>>> tmuint32_int32_abs_gt(PG_FUNCTION_ARGS)
>>> {
>>> TmUInt32 *param1;
>>> int32_t param2;
>>> elog(NOTICE, "funcao:%s linha:%d\n", *_FUNCTION_*, *_LINE_*);
>>>
>>> if(PG_ARGISNULL(0) || PG_ARGISNULL(1)) PG_RETURN_NULL();
>>>
>>> param1 = (TmUInt32 *) PG_GETARG_POINTER(0);
>>> param2 = DatumGetInt32(PG_GETARG_DATUM(1));
>>>
>>> elog(NOTICE, "funcao:%s linha:%d param1: %u, param2: %d\n", *_FUNCTION_*,
>>> *_LINE_*, *param1, param2);
>>> PG_RETURN_BOOL(tmuint32_int32_abs_cmp_internal(*param1, param2) > 0);
>>> }
>>>
>>>
>>> And here is the simple test.
>>>
>>> -- SIMPLE QUERY
>>> select * from a;
>>>
>>> NOTICE:  funcao:tmuint32_out linha:191
>>>
>>> NOTICE:  funcao:tmuint32_out linha:191
>>>
>>> NOTICE:  funcao:tmuint32_out linha:191
>>>
>>> NOTICE:  funcao:tmuint32_out linha:191
>>>
>>>  a
>>> ---
>>>  0
>>>  1
>>>  2
>>>  3
>>> (4 rows)
>>>
>>>
>>> _
>>>
>>> -- QUERY WHITH FILTER
>>> select * from a where a > 1;
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 0, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: -1 a: 0
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 99, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 99
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 50, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 50
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 24, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 24
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 12, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 12
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 6, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 6
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1303 param1: 2, param2: 1
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:742
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_cmp_internal linha:746, ret: 1 a: 2
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:1296
>>>
>>> NOTICE:  funcao:tmuint32_int32_abs_gt linha:130

Re: [HACKERS] Fwd: Range types (DATERANGE, TSTZRANGE) in a foreign key with "inclusion" logic

2013-04-08 Thread Kevin Grittner
Matthias Nagel  wrote:

> I would like to do someting like:
>
> FOREIGN KEY ( container_id, lifetime )
> REFERENCES other_table (id, lifetime )
> USING gist ( container_id WITH =, lifetime WITH <@ )
>
> (Of course, this is PosgreSQL-pseudo-code, but it hopefully make
> clear what I want.)
>
> So, now my questions:
>
> 1) Does this kind of feature already exist in 9.2?

No.

> 2) If this feature does not directly exist, has anybody a good
> idea how to mimic the intended behaviour?

I would probably do it by using only SERIALIZABLE transactions to
modify data, and enforce the constraint in triggers.  That would
not be as convenient or as performant as the feature you're
requesting would probably be, but it would work as long as you have
a systematic way to retry transactions which are rolled back with a
serialization failure.

> 3) If neither 1) or 2) applies, are there any plans to integrate
> such a feature?

I have heard Jeff Davis talk about the possibility of such a
feature, that he figures would be called "inclusion constraints". 
I have not heard of any actual development on the idea yet.

> Having range types and exclusion contraints are nice, as I said
> in the introdruction. But if the reverse (foreign key with
> inclusion) would also work, the range type feature would really
> be amazing.

Agreed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

2013-04-08 Thread Josh Berkus
All,

As much as I have a keen interest in this feature, it isn't (AFAIK)
being considered for 9.3.  Given that it's generated a fair amount of
controversy, could we table it until 9.3 beta?  There's still plenty of
unresolved 9.3 patches in the queue.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

2013-04-08 Thread Bruce Momjian
On Mon, Apr  8, 2013 at 04:24:31PM -0700, Josh Berkus wrote:
> All,
> 
> As much as I have a keen interest in this feature, it isn't (AFAIK)
> being considered for 9.3.  Given that it's generated a fair amount of
> controversy, could we table it until 9.3 beta?  There's still plenty of
> unresolved 9.3 patches in the queue.

Can someone add a TODO item or a link to the thread?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reassign variable value in XLogReadRecord

2013-04-08 Thread Dickson S. Guedes
Hello,

While walking in the code I see the following code in
src/backend/access/transam/xlogreader.c:177-191

XLogRecord *
XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
{
XLogRecord *record;
XLogRecPtr  targetPagePtr;
boolrandAccess = false;  <=== assign
uint32  len,
total_len;
uint32  targetRecOff;
uint32  pageHeaderSize;
boolgotheader;
int readOff;

randAccess = false;  <== reassign
/* reset error state */


Do I am missing something or the last one is unnecessary?

Best regards.
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

2013-04-08 Thread Brendan Jurd
On 9 April 2013 09:24, Josh Berkus  wrote:
> As much as I have a keen interest in this feature, it isn't (AFAIK)
> being considered for 9.3.  Given that it's generated a fair amount of
> controversy, could we table it until 9.3 beta?  There's still plenty of
> unresolved 9.3 patches in the queue.

No problem.  I certainly wasn't expecting it to run for 90 messages
when I started out.  I'll pipe down for now and resume after the beta.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2013-04-08 Thread Ants Aasma
On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma  wrote:
> Unless somebody tells me not to waste my time I'll go ahead and come
> up with a workable patch by Monday.

And here you go. I decided to be verbose with the comments as it's
easier to delete a comment to write one. I also left in a huge jumble
of macros to calculate the contents of a helper var during compile
time. This can easily be replaced with the calculated values once we
settle on specific parameters.

Currently only x86-64 is implemented. 32bit x86 would be mostly a
copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
For other platforms the simplest way would be to use a vectorizing
compiler on the generic variant. -funroll-loops -ftree-vectorize is
enough on gcc.

Quick bench results on the worst case workload:
master no checksums: tps = 15.561848
master with checksums: tps = 1.695450
simd checksums: tps = 14.602698

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


simd-checksums.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-08 Thread Robert Haas
On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila  wrote:
> I still have one more doubt, consider the below scenario for cases when we
> Invalidate buffers during moving to freelist v/s just move to freelist
>
>Backend got the buffer from freelist for a request of page-9 (number 9 is
> random, just to explain), it still have association with another page-10
>It needs to add the buffer with new tag (new page association) in bufhash
> table and remove the buffer with oldTag (old page association).
>
> The benefit for just moving to freelist is that if we get request of same
> page until somebody else used it for another page, it will save read I/O.
> However on the other side for many cases
> Backend will need extra partition lock to remove oldTag (which can lead to
> some bottleneck).
>
> I think saving read I/O is more beneficial but just not sure if that is best
> as cases might be less for it.

I think saving read I/O is a lot more beneficial.  I haven't seen
evidence of a severe bottleneck updating the buffer mapping tables.  I
have seen some evidence of spinlock-level contention on read workloads
that fit in shared buffers, because in that case the system can run
fast enough for the spinlocks protecting the lwlocks to get pretty
hot.  But if you're doing writes, or if the workload doesn't fit in
shared buffers, other bottlenecks slow you down enough that this
doesn't really seem to become much of an issue.

Also, even if you *can* find some scenario where pushing the buffer
invalidation into the background is a win, I'm not convinced that
would justify doing it, because the case where it's a huge loss -
namely, working set just a tiny bit smaller than shared_buffers - is
pretty obvious. I don't think we dare fool around with that; the
townspeople will arrive with pitchforks.

I believe that the big win here is getting the clock sweep out of the
foreground so that BufFreelistLock doesn't catch fire.  The buffer
mapping locks are partitioned and, while it's not like that completely
gets rid of the contention, it sure does help a lot.  So I would view
that goal as primary, at least for now.  If we get a first round of
optimization done in this area, that doesn't preclude further
improving it in the future.

> Last time following tests have been executed to validate the results:
>
> Test suite - pgbench
> DB Size - 16 GB
> RAM - 24 GB
> Shared Buffers - 2G, 5G, 7G, 10G
> Concurrency - 8, 16, 32, 64 clients
> Pre-warm the buffers before start of test
>
> Shall we try for any other scenario's or for initial test of patch above are
> okay.

Seems like a reasonable place to start.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-08 Thread Tom Lane
Alexander Korotkov  writes:
> On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane  wrote:
>> I spent the weekend hacking on this, making a number of bug fixes and a
>> whole lot of cosmetic changes.  I think there are large parts of this
>> that are in committable shape now, but I still find the actual graph
>> transformation logic to be mostly unintelligible.  I think what's most
>> obscure is the distinction between the arcs list and the keys list of
>> each state in the expanded graph.  I get the impression that the
>> general idea is for the arcs to represent exactly-known transitions
>> while the keys represent imprecisely-known transitions ... but there
>> seems to be at least some leakage between those categories.  Could
>> you write down a specification for what's supposed to be happening
>> there?

> Here is my try to specify it.

Thanks.  I hacked on this some more and committed it.  I found a number
of bugs along the way with respect to handling of word boundaries
(partially-blank transition trigrams) and EOL-color ($) handling.
I think it's all fixed now but it could definitely use some more
study and testing.

One issue that bothered me is that the regression tests really don't
provide much visibility into what the code is doing.  Some of the bugs
had to do with failing to generate expected trigrams, for instance
col ~ 'foo bar' only generating trigram "foo" and not "bar".  This still
led to getting the right answer, so the error was invisible as far as the
tests were concerned.  Is it worth thinking of a way to expose what the
extract function did at SQL level, so we could test more carefully?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unrecognized type error (postgres 9.1.4)

2013-04-08 Thread Amit Kapila
On Monday, April 08, 2013 7:28 PM Rodrigo Barboza wrote:
On Sat, Apr 6, 2013 at 10:39 PM, Rodrigo Barboza 
wrote:
Ok! I will try to reproduce in a smaller scenario. 
On Sat, Apr 6, 2013 at 9:53 PM, Amit Kapila  wrote:
On Saturday, April 06, 2013 12:18 PM Rodrigo Barboza wrote:
>On Sat, Apr 6, 2013 at 12:23 AM, Amit Kapila 
wrote:
 On Saturday, April 06, 2013 3:57 AM Rodrigo Barboza wrote:

Hello.
 I created a type my_uint that is a unsigned int 32.

 I am trying to update data of a table that contains a column of this
type.
 Here is what happens:

 postgresql=> explain analyze UPDATE attribute_type_conf SET rowform =
rowform +1 where rowform <= 18;
 ERROR:  unsupported type: 132852
 Can you post your complete test (like your type creation and its use
for
 table and any initial data you loaded to it)?


>>> Well, it's a lot of data.
>>> May I send it atached?
>>If you can't make it to small reproducible test, then you can send.


>I was trying to reproduce the error, but it is was not raising error.
>I didn't change anything.
>Last week I dropped the database, created it again, populated my db and
when it was time to run the query, the error raised.
>I'm puzzled. I can't trust it...

> But now I run this script and the error finally raised. It seems random.

The reason for seldom behavior is that, it occurs only when the value you
are giving in your where clause lies in valid boundary of histogram (refer
function ineq_histogram_selectivity).

> psql -U testuser testdb -c "drop table if exists tm32;"
> psql -U testuser testdb -c "create table tm32 (a tmuint32);"


> for ((i=0; i<100; i++));do
> psql -U testuser testdb < insert into tm32 values($i);
> ENDOFSQLDATA
> done

> for ((i=0; i<100; i++ )); do
> psql -U testuser testdb < BEGIN;
> UPDATE tm32 SET a = a + 1 WHERE a > $i;
> END;
> ENDOFSQLDATA
> done

> The error message: 
> ERROR: unsupported type: 202886
> ROLLBACK

You have identified rightly in your other mail that it happens in function
convert_numeric_to_scalar(). But I think adding user defined datatype
handling in this function might 
not be straight forward. You can refer below text from link
http://www.postgresql.org/docs/9.2/static/xoper-optimization.html 
"You can use scalarltsel and scalargtsel for comparisons on data types that
have some sensible means of being converted into numeric scalars for range
comparisons. If possible, add the data type to those understood by the
function convert_to_scalar() in src/backend/utils/adt/selfuncs.c.
(Eventually, this function should be replaced by per-data-type functions
identified through a column of the pg_type system catalog; but that hasn't
happened yet.) If you do not do this, things will still work, but the
optimizer's estimates won't be as good as they could be." 

I could think of following workaround's for your problem. 

1. For your table, set values for autovacuum_analyze_threshold and
autovacuum_analyze_scale_factor very high (refer Create Table), so that it
doesn't analyze your 
  table and return default selectivity, which should work fine if your sql
statements are simple. 

2. Write your own selectivity functions and return default Selectivity from
them and use them while creating operators. 

3. Use bind value in where clause, it will return default selectivity for
it. 

4. Some other way, with which it does not collect histogram stats (means it
will use minimal stats compute_minimal_stats). I am not sure but you can try
once without defining operators. 

All the above way's can help to resolve your current problem, but they are
not good way if you have some usage of sql statements with these datatypes.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Samrat Revagade
>What Samrat is proposing here is that WAL is not flushed to the OS before

>it is acked by a synchronous replica so recovery won't go past the

>timeline change made in failover, making it necessary to take a new

>base backup to resync with the new master.

Actually we are proposing that the data page on the master is not committed
till master receives ACK from the standby. The WAL files can be flushed to
the disk on both the master and standby, before standby generates ACK to
master. The end objective is the same of avoiding to take base backup of
old master to resync with new master.

>Why do you think that the inconsistent data after failover happens is
>problem? Because

>it's one of the reasons why a fresh base backup is required when
>starting old master as
>new standby? If yes, I agree with you. I've often heard the complaints
>about a backup
>when restarting new standby. That's really big problem.

 Yes, taking backup is  major problem when the database size is more than
several TB. It would take very long time to ship backup data over the slow
WAN network.

>> One solution to avoid this situation is have the master send WAL records
to standby and wait for ACK from standby committing WAL files to disk and
only after that commit data page related to this transaction on master.

>You mean to make the master wait the data page write until WAL has been
not only
>flushed to disk but also replicated to the standby?

 Yes. Master should not write the data page before corresponding WAL
records have been replicated to the standby. The WAL records have been
flushed to disk on both master and standby.

>> The main drawback would be increased wait time for the client due to
extra round trip to standby before master sends ACK to client. Are there
any other issues with this approach?

>I think that you can introduce GUC specifying whether this extra check
>is required to avoid a backup when failback

That would be better idea. We can disable it whenever taking a fresh backup
is not a problem.


Regards,

Samrat



On Mon, Apr 8, 2013 at 10:40 PM, Fujii Masao  wrote:

> On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade
>  wrote:
> >
> > Hello,
> >
> > We have been trying to figure out possible solutions to the following
> problem in streaming replication Consider following scenario:
> >
> > If master receives commit command, it writes and flushes commit WAL
> records to the disk, It also writes and flushes data page related to this
> transaction.
> >
> > The master then sends WAL records to standby up to the commit WAL
> record. But before sending these records if failover happens then,  old
> master is ahead of  standby which is now the new master in terms of DB data
> leading to inconsistent data .
>
> Why do you think that the inconsistent data after failover happens is
> problem? Because
> it's one of the reasons why a fresh base backup is required when
> starting old master as
> new standby? If yes, I agree with you. I've often heard the complaints
> about a backup
> when restarting new standby. That's really big problem.
>
> The timeline mismatch after failover was one of the reasons why a
> backup is required.
> But, thanks to Heikki's recent work, that's solved, i.e., the timeline
> mismatch would be
> automatically resolved when starting replication in 9.3. So, the
> remaining problem is an
> inconsistent database.
>
> > One solution to avoid this situation is have the master send WAL records
> to standby and wait for ACK from standby committing WAL files to disk and
> only after that commit data page related to this transaction on master.
>
> You mean to make the master wait the data page write until WAL has been
> not only
> flushed to disk but also replicated to the standby?
>
> > The main drawback would be increased wait time for the client due to
> extra round trip to standby before master sends ACK to client. Are there
> any other issues with this approach?
>
> I think that you can introduce GUC specifying whether this extra check
> is required to
> avoid a backup when failback.
>
> Regards,
>
> --
> Fujii Masao
>