Re: [HACKERS] Help to startup

2014-09-21 Thread Craig Ringer
On 09/17/2014 01:51 AM, Tapan Halani wrote:
> Hello everyone..i am new to PostgreSQL project. I had prior experience
> with sql+ , with oracle 11g database server. Kindly help me grasp more
> about the project.

Since you're asking on pgsql-hackers, you're presumably interested in
getting involved in developing extensions or feature work?

See:

http://www.postgresql.org/developer/

If that's *not* what you're trying to do, then perhaps you want the
tutorial, main documentation, and pgsql-general mailing list?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH CHECK OPTION bug [was RLS Design]

2014-09-21 Thread Dean Rasheed
On 20 September 2014 14:08, Michael Paquier  wrote:
> On Sat, Sep 20, 2014 at 7:03 AM, Dean Rasheed  
> wrote:
>> Fortunately it looks pretty trivial though. The patch attached fixes
>> the above test cases.
>> Obviously this needs to be fixed in 9.4 and HEAD.
> Wouldn't it be better if bundled with some regression tests?

Yeah OK, fair point. Here are some tests that cover that code path.
I've also thrown in a test with prepared statements, although that
case was already working, it seemed worth checking.

Regards,
Dean
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
new file mode 100644
index 5bf84c1..9ddc8ad
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*** set_plan_refs(PlannerInfo *root, Plan *p
*** 696,701 
--- 696,704 
  Assert(splan->plan.targetlist == NIL);
  Assert(splan->plan.qual == NIL);
  
+ splan->withCheckOptionLists =
+ 	fix_scan_list(root, splan->withCheckOptionLists, rtoffset);
+ 
  if (splan->returningLists)
  {
  	List	   *newRL = NIL;
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
new file mode 100644
index 6a35925..8a81251
*** a/src/test/regress/expected/updatable_views.out
--- b/src/test/regress/expected/updatable_views.out
*** NOTICE:  drop cascades to 3 other object
*** 1567,1572 
--- 1567,1592 
  DETAIL:  drop cascades to view rw_view1
  drop cascades to view rw_view2
  drop cascades to view rw_view3
+ -- WITH CHECK OPTION with scalar array ops
+ CREATE TABLE base_tbl (a int, b int[]);
+ CREATE VIEW rw_view1 AS SELECT * FROM base_tbl WHERE a = ANY (b)
+   WITH CHECK OPTION;
+ INSERT INTO rw_view1 VALUES (1, ARRAY[1,2,3]); -- ok
+ INSERT INTO rw_view1 VALUES (10, ARRAY[4,5]); -- should fail
+ ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+ DETAIL:  Failing row contains (10, {4,5}).
+ UPDATE rw_view1 SET b[2] = -b[2] WHERE a = 1; -- ok
+ UPDATE rw_view1 SET b[1] = -b[1] WHERE a = 1; -- should fail
+ ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+ DETAIL:  Failing row contains (1, {-1,-2,3}).
+ PREPARE ins(int, int[]) AS INSERT INTO rw_view1 VALUES($1, $2);
+ EXECUTE ins(2, ARRAY[1,2,3]); -- ok
+ EXECUTE ins(10, ARRAY[4,5]); -- should fail
+ ERROR:  new row violates WITH CHECK OPTION for "rw_view1"
+ DETAIL:  Failing row contains (10, {4,5}).
+ DEALLOCATE PREPARE ins;
+ DROP TABLE base_tbl CASCADE;
+ NOTICE:  drop cascades to view rw_view1
  -- WITH CHECK OPTION with subquery
  CREATE TABLE base_tbl (a int);
  CREATE TABLE ref_tbl (a int PRIMARY KEY);
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
new file mode 100644
index c072fca..60c7e29
*** a/src/test/regress/sql/updatable_views.sql
--- b/src/test/regress/sql/updatable_views.sql
*** INSERT INTO rw_view3 VALUES (3); -- ok
*** 707,712 
--- 707,731 
  
  DROP TABLE base_tbl CASCADE;
  
+ -- WITH CHECK OPTION with scalar array ops
+ 
+ CREATE TABLE base_tbl (a int, b int[]);
+ CREATE VIEW rw_view1 AS SELECT * FROM base_tbl WHERE a = ANY (b)
+   WITH CHECK OPTION;
+ 
+ INSERT INTO rw_view1 VALUES (1, ARRAY[1,2,3]); -- ok
+ INSERT INTO rw_view1 VALUES (10, ARRAY[4,5]); -- should fail
+ 
+ UPDATE rw_view1 SET b[2] = -b[2] WHERE a = 1; -- ok
+ UPDATE rw_view1 SET b[1] = -b[1] WHERE a = 1; -- should fail
+ 
+ PREPARE ins(int, int[]) AS INSERT INTO rw_view1 VALUES($1, $2);
+ EXECUTE ins(2, ARRAY[1,2,3]); -- ok
+ EXECUTE ins(10, ARRAY[4,5]); -- should fail
+ DEALLOCATE PREPARE ins;
+ 
+ DROP TABLE base_tbl CASCADE;
+ 
  -- WITH CHECK OPTION with subquery
  
  CREATE TABLE base_tbl (a int);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: window function - change_number

2014-09-21 Thread Pavel Stehule
Hi
I tried to solve following task:

I have a table

start, reason, km
=
 2014-01-01 08:00:00, private, 10
 2014-01-01 09:00:00, commerc, 20
 2014-01-01 10:00:00, commerc, 20
 2014-01-01 11:00:00, private, 8

and I would reduce these rows to

 2014-01-01 08:00:00, private, 10
 2014-01-01 09:00:00, commerc, 20 + 20 = 40
 2014-01-01 11:00:00, private, 8

It is relative hard to it now with SQL only. But we can simplify this task
with window function that returns number of change in some column. Then
this task can be solved by

select min(start), min(reason), sum(km)
  from (select start, reason, km, change_number(reason) over (order by
start))
  group by change_number;

Do you think, so it has sense?

Regards

Pavel


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread David Rowley
On Sun, Sep 21, 2014 at 9:27 PM, Pavel Stehule 
wrote:

> Hi
> I tried to solve following task:
>
> I have a table
>
> start, reason, km
> =
>  2014-01-01 08:00:00, private, 10
>  2014-01-01 09:00:00, commerc, 20
>  2014-01-01 10:00:00, commerc, 20
>  2014-01-01 11:00:00, private, 8
>
> and I would reduce these rows to
>
>  2014-01-01 08:00:00, private, 10
>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
>  2014-01-01 11:00:00, private, 8
>
> It is relative hard to it now with SQL only. But we can simplify this task
> with window function that returns number of change in some column. Then
> this task can be solved by
>
> select min(start), min(reason), sum(km)
>   from (select start, reason, km, change_number(reason) over (order by
> start))
>   group by change_number;
>
>
I guess that might be quite useful, otherwise the only way that comes to
mind to do this would be something along the lines of:

select *,sum(case when reason <> lastreason then 1 else 0 end) over (order
by start) as chg_num from (select *,lag(reason) over (order by start) vnext
from sometable) sometable;

This way might not be too bad as I think the outer window will have no need
to perform another sort, since the inner window clause has sorted it the
right way already. Though something like change_number() would make this a
bit more pretty. It's almost like rank(), but with a parameter.

Regards

David Rowley


Re: [HACKERS] Help to startup

2014-09-21 Thread Michael Paquier
On Sun, Sep 21, 2014 at 4:52 PM, Craig Ringer  wrote:
> On 09/17/2014 01:51 AM, Tapan Halani wrote:
>> Hello everyone..i am new to PostgreSQL project. I had prior experience
>> with sql+ , with oracle 11g database server. Kindly help me grasp more
>> about the project.
>
> Since you're asking on pgsql-hackers, you're presumably interested in
> getting involved in developing extensions or feature work?
>
> See:
>
> http://www.postgresql.org/developer/
There is a group picture of 2006 :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Mart Kelder
Hi Pavel (and others),

Pavel Stehule wrote:
> Hi
> I tried to solve following task:
> 
> I have a table
> 
> start, reason, km
> =
>  2014-01-01 08:00:00, private, 10
>  2014-01-01 09:00:00, commerc, 20
>  2014-01-01 10:00:00, commerc, 20
>  2014-01-01 11:00:00, private, 8
> 
> and I would reduce these rows to
> 
>  2014-01-01 08:00:00, private, 10
>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
>  2014-01-01 11:00:00, private, 8
> 
> It is relative hard to it now with SQL only. But we can simplify this task
> with window function that returns number of change in some column. Then
> this task can be solved by
> 
> select min(start), min(reason), sum(km)
>   from (select start, reason, km, change_number(reason) over (order by
> start))
>   group by change_number;

What about

select srk.reason, min(srk.start), sum(srk.km)
from start_reason_km srk
group by srk.reason, (select max(start) from start_reason_km other WHERE 
other.start < srk.start and other.reason != srk.reason);

In general, I think window function are very specific in how the queryplan 
must look like, leaving not much room for the optimizer. On the other hand, 
if there happends to be an efficient way to get the results of the table 
ordered by "start", then the window function will very likely much faster 
then a join. I would be nice if the optimizer is able to add such stream 
order operations.

> Do you think, so it has sense?
> 
> Regards
> 
> Pavel

Regards,

Mart

PS: This is my first post to the mailing list. I am a software developer 
interest is performance making webapplications with a different database 
server during working hours.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Pavel Stehule
2014-09-21 14:30 GMT+02:00 Mart Kelder :

> Hi Pavel (and others),
>
> Pavel Stehule wrote:
> > Hi
> > I tried to solve following task:
> >
> > I have a table
> >
> > start, reason, km
> > =
> >  2014-01-01 08:00:00, private, 10
> >  2014-01-01 09:00:00, commerc, 20
> >  2014-01-01 10:00:00, commerc, 20
> >  2014-01-01 11:00:00, private, 8
> >
> > and I would reduce these rows to
> >
> >  2014-01-01 08:00:00, private, 10
> >  2014-01-01 09:00:00, commerc, 20 + 20 = 40
> >  2014-01-01 11:00:00, private, 8
> >
> > It is relative hard to it now with SQL only. But we can simplify this
> task
> > with window function that returns number of change in some column. Then
> > this task can be solved by
> >
> > select min(start), min(reason), sum(km)
> >   from (select start, reason, km, change_number(reason) over (order by
> > start))
> >   group by change_number;
>
> What about
>
> select srk.reason, min(srk.start), sum(srk.km)
> from start_reason_km srk
> group by srk.reason, (select max(start) from start_reason_km other WHERE
> other.start < srk.start and other.reason != srk.reason);
>

This query is Cartesian product, so for some large data it is significantly
slower then window function (required only sorts without joins)

My motivation was a) to implement described task without Cartesian product.
b) introduce some tool for this kind of problems. I seen more times a
request .. reduce a time series, and a window function "change_number" (or
maybe "consistent_series_number") can be good candidate.


>
> In general, I think window function are very specific in how the queryplan
> must look like, leaving not much room for the optimizer. On the other hand,
> if there happends to be an efficient way to get the results of the table
> ordered by "start", then the window function will very likely much faster
> then a join. I would be nice if the optimizer is able to add such stream
> order operations.


> > Do you think, so it has sense?
> >
> > Regards
> >
> > Pavel
>
> Regards,
>
> Mart
>
> PS: This is my first post to the mailing list. I am a software developer
> interest is performance making webapplications with a different database
> server during working hours.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] WITH CHECK OPTION bug [was RLS Design]

2014-09-21 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 20 September 2014 14:08, Michael Paquier  wrote:
> > On Sat, Sep 20, 2014 at 7:03 AM, Dean Rasheed  
> > wrote:
> >> Fortunately it looks pretty trivial though. The patch attached fixes
> >> the above test cases.
> >> Obviously this needs to be fixed in 9.4 and HEAD.
> > Wouldn't it be better if bundled with some regression tests?

Agreed.

> Yeah OK, fair point. Here are some tests that cover that code path.
> I've also thrown in a test with prepared statements, although that
> case was already working, it seemed worth checking.

Thanks!  Looks good, but will review in more depth, address the other
comments on this thread (typo, adding more documentation) and
investigate the results from the Coverity run this morning this evening
and should be able to get everything addressed in the next couple days.
Obviously, I'll be back-patching this fix to 9.4 too.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Mart Kelder
Hi Pavel (and others),

Op zondag 21 september 2014 15:35:46 schreef u:
> 2014-09-21 14:30 GMT+02:00 Mart Kelder :
> > Hi Pavel (and others),
> > 
> > Pavel Stehule wrote:
> > > Hi
> > > I tried to solve following task:
> > > 
> > > I have a table
> > > 
> > > start, reason, km
> > > =
> > > 
> > >  2014-01-01 08:00:00, private, 10
> > >  2014-01-01 09:00:00, commerc, 20
> > >  2014-01-01 10:00:00, commerc, 20
> > >  2014-01-01 11:00:00, private, 8
> > > 
> > > and I would reduce these rows to
> > > 
> > >  2014-01-01 08:00:00, private, 10
> > >  2014-01-01 09:00:00, commerc, 20 + 20 = 40
> > >  2014-01-01 11:00:00, private, 8
> > > 
> > > It is relative hard to it now with SQL only. But we can simplify this
> > > task
> > >
> > > with window function that returns number of change in some column. Then
> > > this task can be solved by
> > > 
> > > select min(start), min(reason), sum(km)
> > >   from (select start, reason, km, change_number(reason) over (order by
> > > start))
> > > 
> > >   group by change_number;
> > 
> > What about
> > 
> > select srk.reason, min(srk.start), sum(srk.km)
> > from start_reason_km srk
> > group by srk.reason, (select max(start) from start_reason_km other WHERE
> > other.start < srk.start and other.reason != srk.reason);
> 
> This query is Cartesian product, so for some large data it is significantly
> slower then window function (required only sorts without joins)

I think we have the same queryplan in mind (with only one scan). As far as I 
know, SQL is a language where you define the result you want to get, and let 
the server find a way how to find the data. I think windowed function also say 
how the server needs to get the information.

The real challenge is of course of finding heuristics to remove the additional 
join. In this particular case, I can tell how to remove the inner join from 
the subquery:
* the where-clause of the self-join contains other.start < srk.start. From 
that we can conclude that if the table is (or can be) sorted on "start", we 
have seen the data before the subquery is executed
* because we only need an aggregate, we need to store the intermediate "max" 
for each "reason". And then add the result to the stream.

Recently, I had a simular problem with a table containing a timestamp, a state 
and a object where the state belongs to. A object remains in a state until 
there is a more recent tuple in the table. I needed basically to query all the 
previous state for each tuple, but preverably without the additional join.

> My motivation was a) to implement described task without Cartesian product.

Good reason (if you consider the queryplan and not the query).

> b) introduce some tool for this kind of problems. I seen more times a
> request .. reduce a time series, and a window function "change_number" (or
> maybe "consistent_series_number") can be good candidate.

I also need to note that there is a lot of difference in complexity between the 
possible solutions of this problem. Where a new window function can probably 
be very easy implemented, the optimizer changes descripted above are complex 
and not easy to implement. 

I also want to note that I am not really against new window functions, I only 
want to point out that a more generic solution also might be possible.

Regards,

Mart


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Pavel Stehule
2014-09-21 17:00 GMT+02:00 Mart Kelder :

> Hi Pavel (and others),
>
> Op zondag 21 september 2014 15:35:46 schreef u:
> > 2014-09-21 14:30 GMT+02:00 Mart Kelder :
> > > Hi Pavel (and others),
> > >
> > > Pavel Stehule wrote:
> > > > Hi
> > > > I tried to solve following task:
> > > >
> > > > I have a table
> > > >
> > > > start, reason, km
> > > > =
> > > >
> > > >  2014-01-01 08:00:00, private, 10
> > > >  2014-01-01 09:00:00, commerc, 20
> > > >  2014-01-01 10:00:00, commerc, 20
> > > >  2014-01-01 11:00:00, private, 8
> > > >
> > > > and I would reduce these rows to
> > > >
> > > >  2014-01-01 08:00:00, private, 10
> > > >  2014-01-01 09:00:00, commerc, 20 + 20 = 40
> > > >  2014-01-01 11:00:00, private, 8
> > > >
> > > > It is relative hard to it now with SQL only. But we can simplify this
> > > > task
> > > >
> > > > with window function that returns number of change in some column.
> Then
> > > > this task can be solved by
> > > >
> > > > select min(start), min(reason), sum(km)
> > > >   from (select start, reason, km, change_number(reason) over (order
> by
> > > > start))
> > > >
> > > >   group by change_number;
> > >
> > > What about
> > >
> > > select srk.reason, min(srk.start), sum(srk.km)
> > > from start_reason_km srk
> > > group by srk.reason, (select max(start) from start_reason_km other
> WHERE
> > > other.start < srk.start and other.reason != srk.reason);
> >
> > This query is Cartesian product, so for some large data it is
> significantly
> > slower then window function (required only sorts without joins)
>
> I think we have the same queryplan in mind (with only one scan). As far as
> I
> know, SQL is a language where you define the result you want to get, and
> let
> the server find a way how to find the data. I think windowed function also
> say
> how the server needs to get the information.
>

What I know It is not true now. Any subselect enforce individual scan of
source relation. Postgres has no any special support for selfjoin.



>
> The real challenge is of course of finding heuristics to remove the
> additional
> join. In this particular case, I can tell how to remove the inner join from
> the subquery:
> * the where-clause of the self-join contains other.start < srk.start. From
> that we can conclude that if the table is (or can be) sorted on "start", we
> have seen the data before the subquery is executed
> * because we only need an aggregate, we need to store the intermediate
> "max"
> for each "reason". And then add the result to the stream.
>
> Recently, I had a simular problem with a table containing a timestamp, a
> state
> and a object where the state belongs to. A object remains in a state until
> there is a more recent tuple in the table. I needed basically to query all
> the
> previous state for each tuple, but preverably without the additional join.
>
> > My motivation was a) to implement described task without Cartesian
> product.
>
Good reason (if you consider the queryplan and not the query).
>
>
yes.

There is probably big space for improvements in more directions. For
example - we have application, where is often used pattern SELECT FROM A
JOIN (SELECT someagg() FROM A) .. ON

Sometimes these queries are slow due terrible low estimation. It is one
example of more

Pavel


> > b) introduce some tool for this kind of problems. I seen more times a
> > request .. reduce a time series, and a window function "change_number"
> (or
> > maybe "consistent_series_number") can be good candidate.
>
> I also need to note that there is a lot of difference in complexity
> between the
> possible solutions of this problem. Where a new window function can
> probably
> be very easy implemented, the optimizer changes descripted above are
> complex
> and not easy to implement.
>
> I also want to note that I am not really against new window functions, I
> only
> want to point out that a more generic solution also might be possible.
>
> Regards,
>
> Mart
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Andrew Gierth
> "Pavel" == Pavel Stehule  writes:

 Pavel> Hi
 Pavel> I tried to solve following task:

 Pavel> I have a table

 Pavel> start, reason, km
 Pavel> =
 Pavel>  2014-01-01 08:00:00, private, 10
 Pavel>  2014-01-01 09:00:00, commerc, 20
 Pavel>  2014-01-01 10:00:00, commerc, 20
 Pavel>  2014-01-01 11:00:00, private, 8

 Pavel> and I would reduce these rows to

 Pavel>  2014-01-01 08:00:00, private, 10
 Pavel>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
 Pavel>  2014-01-01 11:00:00, private, 8

 Pavel> It is relative hard to it now with SQL only.

Only relatively. My standard solution is something like this:

select start_time, reason, sum(km) as km
  from (select max(label_time) over (order by start) as start_time,
   reason, km
  from (select start, reason, km,
   case when reason
 is distinct from
 lag(reason) over (order by start)
then start
   end as label_time
  from yourtable
   ) s2
   ) s1
 group by start_time, reason
 order by start_time;

(Your change_number idea is essentially equivalent to doing
sum(case when x is distinct from lag(x) over w then 1 end) over w,
except that since window functions can't be nested, that expression
requires a subquery.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Rémi Cura
Hey, sorry I what I say is obvious for you .

If I understood your problem correctly, it is strictly equivalent to this
one :
http://postgresql.1045698.n5.nabble.com/Count-of-records-in-a-row-td5775363.html

there is a postgres trick to solve this problem :
what you want is essentially generate a unique group_id,
but one that depends of an order of row not defined in the group.

The solution
is to generate a row number by the order you want , then a row number by
the group ,
then a subtraction of the 2 row number gives you an unique id per group.

The cost is that you have to use 2 windows function., hence 2 scans I guess.

Cheers,
Rémi-C

2014-09-21 17:51 GMT+02:00 Andrew Gierth :

> > "Pavel" == Pavel Stehule  writes:
>
>  Pavel> Hi
>  Pavel> I tried to solve following task:
>
>  Pavel> I have a table
>
>  Pavel> start, reason, km
>  Pavel> =
>  Pavel>  2014-01-01 08:00:00, private, 10
>  Pavel>  2014-01-01 09:00:00, commerc, 20
>  Pavel>  2014-01-01 10:00:00, commerc, 20
>  Pavel>  2014-01-01 11:00:00, private, 8
>
>  Pavel> and I would reduce these rows to
>
>  Pavel>  2014-01-01 08:00:00, private, 10
>  Pavel>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
>  Pavel>  2014-01-01 11:00:00, private, 8
>
>  Pavel> It is relative hard to it now with SQL only.
>
> Only relatively. My standard solution is something like this:
>
> select start_time, reason, sum(km) as km
>   from (select max(label_time) over (order by start) as start_time,
>reason, km
>   from (select start, reason, km,
>case when reason
>  is distinct from
>  lag(reason) over (order by start)
> then start
>end as label_time
>   from yourtable
>) s2
>) s1
>  group by start_time, reason
>  order by start_time;
>
> (Your change_number idea is essentially equivalent to doing
> sum(case when x is distinct from lag(x) over w then 1 end) over w,
> except that since window functions can't be nested, that expression
> requires a subquery.)
>
> --
> Andrew (irc:RhodiumToad)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Pavel Stehule
2014-09-21 17:51 GMT+02:00 Andrew Gierth :

> > "Pavel" == Pavel Stehule  writes:
>
>  Pavel> Hi
>  Pavel> I tried to solve following task:
>
>  Pavel> I have a table
>
>  Pavel> start, reason, km
>  Pavel> =
>  Pavel>  2014-01-01 08:00:00, private, 10
>  Pavel>  2014-01-01 09:00:00, commerc, 20
>  Pavel>  2014-01-01 10:00:00, commerc, 20
>  Pavel>  2014-01-01 11:00:00, private, 8
>
>  Pavel> and I would reduce these rows to
>
>  Pavel>  2014-01-01 08:00:00, private, 10
>  Pavel>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
>  Pavel>  2014-01-01 11:00:00, private, 8
>
>  Pavel> It is relative hard to it now with SQL only.
>
> Only relatively. My standard solution is something like this:
>
> select start_time, reason, sum(km) as km
>   from (select max(label_time) over (order by start) as start_time,
>reason, km
>   from (select start, reason, km,
>case when reason
>  is distinct from
>  lag(reason) over (order by start)
> then start
>end as label_time
>   from yourtable
>) s2
>) s1
>  group by start_time, reason
>  order by start_time;
>
> (Your change_number idea is essentially equivalent to doing
> sum(case when x is distinct from lag(x) over w then 1 end) over w,
> except that since window functions can't be nested, that expression
> requires a subquery.)
>
>
yes, I found this solution in third iteration too.

so this proposal lost a main benefit

Regards

Pavel


> --
> Andrew (irc:RhodiumToad)
>


Re: [HACKERS] proposal: window function - change_number

2014-09-21 Thread Pavel Stehule
2014-09-21 18:08 GMT+02:00 Rémi Cura :

> Hey, sorry I what I say is obvious for you .
>
> If I understood your problem correctly, it is strictly equivalent to this
> one :
>
> http://postgresql.1045698.n5.nabble.com/Count-of-records-in-a-row-td5775363.html
>
> there is a postgres trick to solve this problem :
> what you want is essentially generate a unique group_id,
> but one that depends of an order of row not defined in the group.
>
> The solution
> is to generate a row number by the order you want , then a row number by
> the group ,
> then a subtraction of the 2 row number gives you an unique id per group.
>
> The cost is that you have to use 2 windows function., hence 2 scans I
> guess.
>

yes, it is little bit similar - I found a pattern described by Andrew is
well too.

regards

Pavel


> Cheers,
> Rémi-C
>
> 2014-09-21 17:51 GMT+02:00 Andrew Gierth :
>
>> > "Pavel" == Pavel Stehule  writes:
>>
>>  Pavel> Hi
>>  Pavel> I tried to solve following task:
>>
>>  Pavel> I have a table
>>
>>  Pavel> start, reason, km
>>  Pavel> =
>>  Pavel>  2014-01-01 08:00:00, private, 10
>>  Pavel>  2014-01-01 09:00:00, commerc, 20
>>  Pavel>  2014-01-01 10:00:00, commerc, 20
>>  Pavel>  2014-01-01 11:00:00, private, 8
>>
>>  Pavel> and I would reduce these rows to
>>
>>  Pavel>  2014-01-01 08:00:00, private, 10
>>  Pavel>  2014-01-01 09:00:00, commerc, 20 + 20 = 40
>>  Pavel>  2014-01-01 11:00:00, private, 8
>>
>>  Pavel> It is relative hard to it now with SQL only.
>>
>> Only relatively. My standard solution is something like this:
>>
>> select start_time, reason, sum(km) as km
>>   from (select max(label_time) over (order by start) as start_time,
>>reason, km
>>   from (select start, reason, km,
>>case when reason
>>  is distinct from
>>  lag(reason) over (order by start)
>> then start
>>end as label_time
>>   from yourtable
>>) s2
>>) s1
>>  group by start_time, reason
>>  order by start_time;
>>
>> (Your change_number idea is essentially equivalent to doing
>> sum(case when x is distinct from lag(x) over w then 1 end) over w,
>> except that since window functions can't be nested, that expression
>> requires a subquery.)
>>
>> --
>> Andrew (irc:RhodiumToad)
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-09-21 Thread Alvaro Herrera
Tom Lane wrote:
> Rahila Syed  writes:
> > Please find attached patch to compress FPW using pglz compression.
> 
> Patch not actually attached AFAICS (no, a link is not good enough).

Well, from Rahila's point of view the patch is actually attached, but
she's posting from the Nabble interface, which mangles it and turns into
a link instead.  Not her fault, really -- but the end result is the
same: to properly submit a patch, you need to send an email to the
pgsql-hackers@postgresql.org mailing list, not join a group/forum from
some intermediary newsgroup site that mirrors the list.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better support of exported snapshots with pg_dump

2014-09-21 Thread Michael Paquier
On Thu, Sep 4, 2014 at 11:33 PM, Michael Paquier
 wrote:
> Thoughts?

I have been poking at that during the long flight back from Chicago
and created the attached patch that makes pg_dump able to create a
replication slot (hence have pg_dump put its hands on a synchronized
snapshot describing data at the state of slot creation), then take a
dump using the exported snapshot while maintaining the replication
connection for slot creation alive for the duration of the dump.

Taking a dump consistent with a replication slot is useful for online
upgrade cases first, because you can simply run pg_dump, have a slot
created, and get as well a state of the database consistent with the
slot creation before replaying changes in a way or another. Using
that, a decoder that generates raw queries, and a receiver able to
apply changes on a remote Postgres server, it is possible to get a
kind of live migration solution from a Postgres instance to another
for a single database, as long as the origin server uses 9.4. Making
the receiver report write and flush positions makes also possible the
origin server to use synchronous replication protocol to be sure that
changes got applied on remote before performing a switch from the
origin to the remote (that may actually explain why multi-syncrep
would be useful here for multiple databases). Also, I imagine that
users could even use this tool in pg_dump for example to do some post
processing on the data dumped in accordance to the decoder plugin
before applying changes to a remote source.

Now, this is done with the addition of two options in pg_dump to
control the logical slot creation:
- --slot to define the name of the slot being created
- --plugin-name, to define the name of the decoder plugin
And then you can of course do things like that:
# Raw data dump on a slot
$ pg_dump --slot bar --plugin-name test_decoding
# Existing parallel dump not changed:
$ pg_dump -j 4 -f data -F d
# Parallel dump on a slot
$ pg_dump -j 4 --slot bar --plugin-name test_decoding -f data -F d

This patch does not solve the existing problems related to relation
locking between LOCK taken on tables and the moment a snapshot is
exported (actually that's a different problem), but similarly to
parallel pg_dump it reduces the exposition window to schema changes to
a minimum. This has needed the addition of some logic to make pg_dump
aware of replication connection. Parallel dumps are supported as well,
the trick being to be sure that the existing parallel dump facility is
still using the snapshots from the main db connection, and not the
replication connection, while parallel dumps are possible using the
snapshot from the slot created.

The first patch attached is the feature itself. The second patch, that
can be applied on top the first one, outputs some useful logs to track
the snapshot creation depending on the code paths taken. I used that
for debugging purposes only, just posting it here for reference. I'll
add that to the next commit fest (patch contains docs as well).

Regards,
-- 
Michael
From aeb75d82acc875252dab89bba540ab89fec6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 21 Sep 2014 18:16:22 +0900
Subject: [PATCH] Support dump from replication slot creation state

pg_dump logic now incorporates a couple of things to be able to manage
a replication connection for the creation of a logical slot, only way
to easily get a snapshot to get a consistent database state based on
the creation of this slot. This is a piece of the puzzle for online
upgrades, and is still useful by itself.
---
 doc/src/sgml/ref/pg_dump.sgml| 29 
 src/bin/pg_dump/pg_backup.h  |  6 ++-
 src/bin/pg_dump/pg_backup_archiver.c |  9 ++--
 src/bin/pg_dump/pg_backup_archiver.h |  3 ++
 src/bin/pg_dump/pg_backup_db.c   | 86 ++-
 src/bin/pg_dump/pg_dump.c| 88 +---
 6 files changed, 188 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index eabdc62..c2d1097 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -801,6 +801,16 @@ PostgreSQL documentation
  
 
  
+   --plugin-name=plugin_name
+   
+ 
+  Define a decoder plugin name (see )
+  used for the creation of slot when --slot is defined.
+ 
+   
+ 
+
+ 
   --quote-all-identifiers
   

@@ -866,6 +876,25 @@ PostgreSQL documentation
  
 
  
+   --slot=slot_name
+   
+ 
+  Create a logical replication slot (see
+   for more details) from
+  which is taken a dump of data consistent with the point of the slot
+  creation using the synchronized snapshot it created. This is useful
+  to get a consistent image of a database before beginning to apply slot
+  changes to it as this ensures that the data integrity is maintained as
+ 

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-09-21 Thread Robert Haas
On Fri, Sep 19, 2014 at 5:42 PM, Andres Freund  wrote:
> On 2014-09-19 17:29:08 -0400, Robert Haas wrote:
>> > I generally have serious doubts about disabling it generally for
>> > read workloads. I imagine it e.g. will significantly penalize
>> > workloads where its likely that a cleanup lock can't be acquired
>> > every time...
>>
>> I share that doubt.  But I understand why Simon wants to do something,
>> too, because the current situation is not great either.
>
> Right, I totally agree. I doubt a simple approach like this will work in
> the general case, but I think something needs to be done.
>
> I think limiting the amount of HOT cleanup for readonly queries is a
> good idea, but I think it has to be gradual. Say after a single cleaned
> up page at least another 500 pages need to have been touched till the
> next hot cleanup. That way a single query won't be penalized with
> cleaning up everything, but there'll be some progress.

I tried this kind of thing several years ago with hint-bit-setting and
was unimpressed by the results.

http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

Granted, I never tried a ratio as low as 500:1, and HOT pruning is not
the same thing as setting hint bits, but I think the basic problems
are similar, namely:

1. You can't know how many times the page is going to be referenced in
the future before it again gets modified.  If that number is small,
then you shouldn't bother with hint bits, or HOT-pruning, or freezing.
But if it's big, you should do all of those things as soon as possible
because the benefits are quite significant.  Therefore, any change in
this area is  guaranteed to lose on some easy-to-construct workload,
because I just described two of them that want opposing things.

2. Dirtying every N'th page is a great way to generate lots of random
I/O that will quite possibly make your disk almost as sad - or even
sadder - than dirtying all of them, but without anywhere as near as
much performance benefit.

Variations on this idea have been proposed so many times over the
years that I'm tempted to give some credence to the theory that we
ought to adopt one of them.  But there will certainly be losers, as
well as winners.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index scan optimization

2014-09-21 Thread Rajeev rastogi
I have observed a scope of considerable performance improvement in-case of 
index by a very minor code change.
Consider the below schema:

create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
select count(*) from tbl2 where id2>'a' and id3>99;

As per current design, it takes following steps to retrieve index tuples:

1.   Find the scan start position by searching first position in BTree as 
per the first key condition i.e. as per id2>'a'

2.   Then it fetches each tuples from position found in step-1.

3.   For each tuple, it matches all scan key condition, in our example it 
matches both scan key condition.

4.   If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find 
the scan start position (Step-1), so it is obvious that any further tuple also 
will match the first scan key condition (as records are sorted).
So comparison on first scan key condition again in step-3 seems to be redundant.

So my proposal is to skip the condition check on the first scan key condition 
for every tuple.

I have prototype code to see the result, Below are my some test data as per the 
query attached with this mail:
Time:
Original Code took:3621 ms
Optimized Code took:   2576 ms
So overall performance improvement is around 29%.

Instruction:
Original Code took:20057
Optimized Code took:   14557
So overall instruction improvement is around 27%.
Also I observed that only for function varstr_cmp, around 661M instruction was 
taken, which is completely saved for optimized code.

This optimization can be extended:

1.   For a case where once all scan key matches the condition, no need to 
check condition for further tuples for any scan key. Just keep returning the 
tuple.

2.   Above approach can be used for '<' operator also by finding the 
position of last matching tuples.

I would like to submit the patch for this improvement.
Please provide your feedback. Also let me know if I am missing something.

Thanks and Regards,
Kumar Rajeev Rastogi

--Schema
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

--Procedure to insert 1M data:

create or replace function insert_data(count1 int, count2 int) returns void
AS
$$
Begin
for i IN 1..count1 loop
insert into tbl2 values(i, 'a', i);
end loop;   

for i IN count1+1..count2 loop
insert into tbl2 values(i, 'b', i);
end loop;   
End;  
$$  language plpgsql;

select insert_data(99, 100);

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data(count1 int) returns void
AS
$$
declare
x int;
Begin
for i IN 1..count1 loop
select count(*) into x from tbl2 where id2>'a' and id3>99; 
end loop;   
End;  
$$  language plpgsql;

select select_data(1000);

--Result:
   OptimizedOriginal
Reading-1   2579.27 3621.82
Reading-2   2573.82 3618.29
Reading-3   2575.08 3625.16
Average 2576.06 3621.76

Overall Improvement 29% 

Instruction Original Code took: 20057 M
Insrtuction Optimized Code took:14557 M
So overall instruction improvement is around 27%.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-09-21 Thread Gregory Smith

On 8/28/14, 12:18 PM, Robert Haas wrote:
At least in situations that I've encountered, it's typical to be able 
to determine the frequency with which a given table needs to be 
vacuumed to avoid runaway bloat, and from that you can work backwards 
to figure out how fast you must process it in MB/s, and from there you 
can work backwards to figure out what cost delay will achieve that 
effect. But if the system tinkers with the cost delay under the hood, 
then you're vacuuming at a different (slower) rate and, of course, the 
table bloats.
The last time I took a whack at this, I worked toward making all of the 
parameters operate in terms of target MB/s, for exactly this style of 
thinking and goal.  Those converted into the same old mechanism under 
the hood and I got the math right to give the same behavior for the 
simple cases, but that could have been simplified eventually.  I 
consider that line of thinking to be the only useful one here.


The answer I like to these values that don't inherit as expected in the 
GUC tree is to nuke that style of interface altogether in favor of 
simplifer bandwidth measured one, then perhaps add multiple QoS levels.  
Certainly no interest in treating the overly complicated innards of cost 
computation as a bug and fixing them with even more complicated behavior.


The part of this I was trying hard to find time to do myself by the next 
CF was a better bloat measure tool needed to actually see the problem 
better.  With that in hand, and some nasty test cases, I wanted to come 
back to simplified MB/s vacuum parameters with easier to understand 
sharing rules again.  If other people are hot to go on that topic, I 
don't care if I actually do the work; I just have a pretty clear view of 
what I think people want.


The only plausible use case for setting a per-table rate that I can 
see is when you actually want the system to use that exact rate for 
that particular table. That's the main one, for these must run on 
schedule or else jobs.

Yes.

On 8/29/14, 9:45 AM, Alvaro Herrera wrote:

Anyway it seems to me maybe there is room for a new table storage
parameter, say autovacuum_do_balance which means to participate in the
balancing program or not.


If that eliminates some of the hairy edge cases, sure.

A useful concept to consider is having a soft limit that most thing work 
against, along with a total hard limit for the server.  When one of 
these tight schedule queries with !autovacuum_do_balance starts, they 
must run at their designed speed with no concern for anyone else.  Which 
means:


a) Their bandwidth gets pulled out of the regular, soft limit numbers 
until they're done.  Last time I had one of these jobs, once the big 
important boys were running, everyone else in the regular shared set 
were capped at vacuum_cost_limit=5 worth of work.  Just enough to keep 
up with system catalog things, and over the course of many hours process 
small tables.


b) If you try to submit multiple locked rate jobs at once, and the total 
goes over the hard limit, they have to just be aborted.  If the rush of 
users comes back at 8AM, and you can clean the table up by then if you 
give it 10MB/s, what you cannot do is let some other user decrease your 
rate such that you're unfinished at 8AM.  Then you'll have aggressive AV 
competing against the user load you were trying to prepare for.  It's 
better to just throw a serious error that forces someone to look at the 
hard limit budget and adjust the schedule instead.  The systems with 
this sort of problem are getting cleaned up every single day, almost 
continuously; missing a day is not bad as long as it's noted and fixed 
again before the next cleanup window.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-21 Thread Gregory Smith

On 9/16/14, 8:18 AM, Amit Kapila wrote:

I think the main reason for slight difference is that
when the size of shared buffers is almost same as data size, the number
of buffers it needs from clock sweep are very less, as an example in first
case (when size of shared buffers is 12286MB), it actually needs at most
256 additional buffers (2MB) via clock sweep, where as bgreclaimer
will put 2000 (high water mark) additional buffers (0.5% of shared buffers
is greater than 2000 ) in free list, so bgreclaimer does some extra work
when it is not required
This is exactly what I was warning about, as the sort of lesson learned 
from the last round of such tuning.  There are going to be spots where 
trying to tune the code to be aggressive on the hard cases will work 
great.  But you need to make that dynamic to some degree, such that the 
code doesn't waste a lot of time sweeping buffers when the demand for 
them is actually weak.  That will make all sorts of cases that look like 
this slower.


We should be able to tell these apart if there's enough instrumentation 
and solid logic inside of the program itself though.  The 8.3 era BGW 
coped with a lot of these issues using a particular style of moving 
average with fast reaction time, plus instrumenting the buffer 
allocation rate as accurately as it could. So before getting into 
high/low water note questions, are you comfortable that there's a clear, 
accurate number that measures the activity level that's important here?  
And have you considered ways it might be averaging over time or have a 
history that's analyzed? The exact fast approach / slow decay weighted 
moving average approach of the 8.3 BGW, the thing that tried to smooth 
the erratic data set possible here, was a pretty critical part of 
getting itself auto-tuning to workload size.  It ended up being much 
more important than the work of setting the arbitrary watermark levels.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-09-21 Thread Amit Kapila
On Sat, Sep 20, 2014 at 10:06 PM, Michael Paquier 
wrote:
> On Sat, Sep 20, 2014 at 7:09 AM, Amit Kapila 
wrote:
> > 3.
> > I find existing comments okay, is there a need to change
> > in this case?  Part of the new comment mentions
> > "obtaining start LSN position", actually the start position is
> > identified as part of next function call FindStreamingStart(),
> > only incase it return InvalidXLogRecPtr, the value returned
> > by IDENTIFY_SYSTEM will be used.
> Hopefully I am following you correctly here: comment has been updated
> to mention that the start LSN and TLI fetched from IDENTIFY_SYSTEM are
> always fetched but used only if no valid position is found in output
> folder of pg_receivexlog.

Updated comment is consistent with code, however my main point
was that in this case, I don't see much need to change existing
comment.  I think this point is more related to each individual's
perspective, so if you think there is a need to update the existing
comment, then retain as it is in your patch and let Committer
take a call about it.

> > 4.
> > + /* Obtain a connection before doing anything */
> > + conn = GetConnection();
> > + if (!conn)
> > + /* Error message already written in GetConnection() */
> > Is there a reason for moving this function out of StreamLog(),
> > there is no harm in moving it, but there doesn't seem to be any
> > problem even if it is called inside StreamLog().
> For pg_recvlogical, this move is done to reduce code redundancy, and
> actually it makes sense to just grab one connection in the main() code
> path before performing any replication commands. For pg_receivexlog,
> the move is done because it makes the code more consistent with
> pg_recvlogical, and also it is a preparatory work for the second patch
> that introduces the create/drop slot. Even if 2nd patch is not
> accepted I figured out that it would not hurt to simply grab one
> connection in the main code path before doing any action.

In pg_receivexlog, StreamLog() calls PQfinish() to close a connection
to the backend and StreamLog() is getting called in while(true) loop,
so if you just grab the connection once in main loop, the current
logic won't work.  For same reasons, the current coding related to
GetConnection() in pg_receivelogical seems to be right, so it is better
not to change that as well.  For the second patch (that introduces the
create/drop slot),  I think it is better to do in a way similar to what
current pg_receivelogical does.

>
> > 6.
> > + /* Identify system, obtaining start LSN position at the same time */
> > + if (!RunIdentifySystem(conn,
> > NULL, NULL, &startpos, &plugin_name))
> > + disconnect_and_exit(1);
> > a.
> > Generally IdentifySystem is called as first API, but I think you
> > have changed its location after CreateReplicationSlot as you want
> > to double check the value of plugin, not sure if that is necessary or
> > important enough that we start calling it later.
> Funny part here is that even the current code on master and
> REL9_4_STABLE of pg_recvlogical.c is fetching a start position when
> creating a replication slot that is not used as two different actions
> cannot be used at the same time. That's a matter of removing this line
> in code though:
> startpos = ((uint64) hi) << 32 | lo;

User is not allowed to give startpos with drop or create of replication
slot.  It is prevented by check:
if (startpos != InvalidXLogRecPtr && (do_create_slot || do_drop_slot))
{
fprintf(stderr, _
("%s: cannot use --create or --drop together with --startpos\n"), progname);
fprintf(stderr, _
("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
So it seems you cannot remove the startpos assignment in code.

> As that's only cosmetic for 9.4, and this patch makes things more
> correct I guess that's fine to do nothing and just try to get this
> patch in.

In what sense do you think that this part of patch is better than
current code?
I think calling Identify_System as a first command makes sense
(as it is in current code) because if that fails we should not
proceed with execution of other command's.

Another point about refactoring patch is that fourth column in
Identify_System is dbname and in patch, you are referring it as
plugin which seems to be inconsistent.
Refer below code in patch:

RunIdentifySystem()
{
..
+ /* Get decoder plugin name, only available in 9.4 and newer versions */
+ if  (plugin != NULL)
+
*plugin = PQnfields(res) > 3 && !PQgetisnull(res, 0, 3) ?
+ pg_strdup(PQgetvalue
(res, 0, 3)) : (char *) NULL;
..
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_receivexlog and replication slots

2014-09-21 Thread Michael Paquier
On Mon, Sep 22, 2014 at 2:25 PM, Amit Kapila  wrote:
> On Sat, Sep 20, 2014 at 10:06 PM, Michael Paquier
>  wrote:
>> On Sat, Sep 20, 2014 at 7:09 AM, Amit Kapila 
>> wrote:
>> > 3.
>> > I find existing comments okay, is there a need to change
>> > in this case?  Part of the new comment mentions
>> > "obtaining start LSN position", actually the start position is
>> > identified as part of next function call FindStreamingStart(),
>> > only incase it return InvalidXLogRecPtr, the value returned
>> > by IDENTIFY_SYSTEM will be used.
>> Hopefully I am following you correctly here: comment has been updated
>> to mention that the start LSN and TLI fetched from IDENTIFY_SYSTEM are
>> always fetched but used only if no valid position is found in output
>> folder of pg_receivexlog.
>
> Updated comment is consistent with code, however my main point
> was that in this case, I don't see much need to change existing
> comment.  I think this point is more related to each individual's
> perspective, so if you think there is a need to update the existing
> comment, then retain as it is in your patch and let Committer
> take a call about it.
Well, let's use your suggestion then and switch back to the former comment then.

>> > 4.
>> > + /* Obtain a connection before doing anything */
>> > + conn = GetConnection();
>> > + if (!conn)
>> > + /* Error message already written in GetConnection() */
>> > Is there a reason for moving this function out of StreamLog(),
>> > there is no harm in moving it, but there doesn't seem to be any
>> > problem even if it is called inside StreamLog().
>> For pg_recvlogical, this move is done to reduce code redundancy, and
>> actually it makes sense to just grab one connection in the main() code
>> path before performing any replication commands. For pg_receivexlog,
>> the move is done because it makes the code more consistent with
>> pg_recvlogical, and also it is a preparatory work for the second patch
>> that introduces the create/drop slot. Even if 2nd patch is not
>> accepted I figured out that it would not hurt to simply grab one
>> connection in the main code path before doing any action.
>
> In pg_receivexlog, StreamLog() calls PQfinish() to close a connection
> to the backend and StreamLog() is getting called in while(true) loop,
> so if you just grab the connection once in main loop, the current
> logic won't work.  For same reasons, the current coding related to
> GetConnection() in pg_receivelogical seems to be right, so it is better
> not to change that as well.  For the second patch (that introduces the
> create/drop slot),  I think it is better to do in a way similar to what
> current pg_receivelogical does.
OK let's do so then. I changed back the GetConnection stuff back to
what is done on master. In the second patch, I added an extra call to
GetConnection before the create/drop commands.

>> > 6.
>> > + /* Identify system, obtaining start LSN position at the same time */
>> > + if (!RunIdentifySystem(conn,
>> > NULL, NULL, &startpos, &plugin_name))
>> > + disconnect_and_exit(1);
>> > a.
>> > Generally IdentifySystem is called as first API, but I think you
>> > have changed its location after CreateReplicationSlot as you want
>> > to double check the value of plugin, not sure if that is necessary or
>> > important enough that we start calling it later.
>> Funny part here is that even the current code on master and
>> REL9_4_STABLE of pg_recvlogical.c is fetching a start position when
>> creating a replication slot that is not used as two different actions
>> cannot be used at the same time. That's a matter of removing this line
>> in code though:
>> startpos = ((uint64) hi) << 32 | lo;
>
> User is not allowed to give startpos with drop or create of replication
> slot.  It is prevented by check:
> if (startpos != InvalidXLogRecPtr && (do_create_slot || do_drop_slot))
> So it seems you cannot remove the startpos assignment in code.
Ah yes true, it is actually possible to start the replication process
after creating the slot, so better not to remove it... I have updated
CreateReplicationSlot to add startpos in the fields that can be
requested from results.

>> As that's only cosmetic for 9.4, and this patch makes things more
>> correct I guess that's fine to do nothing and just try to get this
>> patch in.
>
> In what sense do you think that this part of patch is better than
> current code?
I was trying to make the code a maximum consistent between the two
utilities, but your approach makes more sense.

> I think calling Identify_System as a first command makes sense
> (as it is in current code) because if that fails we should not
> proceed with execution of other command's.
Yes looking at that again you are right.

> Another point about refactoring patch is that fourth column in
> Identify_System is dbname and in patch, you are referring it as
> plugin which seems to be inconsistent.
Oops. Thanks for checking, I changed the check to have something for
the database name.

At the sam

[HACKERS] libpq connection status and closed fd

2014-09-21 Thread Daniele Varrazzo
Hello,

a psycopg user is reporting [1] that the library is not marking the
connection as closed and/or bad after certain errors, such as a
connection timeout. He is emulating the error by closing the
connection fd (I don't know if the two conditions result in the same
effect, but I'll stick to this hypothesis for now).

[1] https://github.com/psycopg/psycopg2/issues/263

Testing with a short C program [2] it seems that indeed, after closing
the fd and causing an error in `PQconsumeInput`, the connection is
deemed in good state by `PQstatus`. A similar test gives the same
result after `PQexec` is attempted on a connection whose fd is closed
(tests performed with PostgreSQL 9.3.5).

[2] https://gist.github.com/dvarrazzo/065f343c95f8ea67cf8f

Is this intentional? Is there a better way to check for a broken connection?

If we mark the connection as closed every time `PQconsumeInput`
returns 0 (or `PQexec` returns null, which are the two code paths
affecting psycopg) would it be too aggressive and cause false
positives?

Thank you very much.

-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation fix for pg_recvlogical's --create mode

2014-09-21 Thread Michael Paquier
Hi all,

In the documentation of pg_recvlogical here
(http://www.postgresql.org/docs/devel/static/app-pgrecvlogical.html),
there is the following sentence:
"Create a new logical replication slot with the name specified in
--slot, using the output plugin --plugin, then exit."
Actually that's not completely true, we can create a replication slot
and then start streaming by combining --start and --create. Process
will only exit if --start is not used. Patch is attached.
Regards,
-- 
Michael
diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index ce5ad5e..76240fe 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -58,8 +58,8 @@ PostgreSQL documentation

 Create a new logical replication slot with the name specified in
 --slot, using the output plugin
---plugin, then exit. The slot is created for the
-database given in --dbname.
+--plugin. The slot is created for the database
+given in --dbname.

   
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-09-21 Thread Amit Kapila
On Mon, Sep 22, 2014 at 10:43 AM, Gregory Smith 
wrote:

> On 9/16/14, 8:18 AM, Amit Kapila wrote:
>
>> I think the main reason for slight difference is that
>> when the size of shared buffers is almost same as data size, the number
>> of buffers it needs from clock sweep are very less, as an example in first
>> case (when size of shared buffers is 12286MB), it actually needs at most
>> 256 additional buffers (2MB) via clock sweep, where as bgreclaimer
>> will put 2000 (high water mark) additional buffers (0.5% of shared buffers
>> is greater than 2000 ) in free list, so bgreclaimer does some extra work
>> when it is not required
>>
> This is exactly what I was warning about, as the sort of lesson learned
> from the last round of such tuning.  There are going to be spots where
> trying to tune the code to be aggressive on the hard cases will work
> great.  But you need to make that dynamic to some degree, such that the
> code doesn't waste a lot of time sweeping buffers when the demand for them
> is actually weak.  That will make all sorts of cases that look like this
> slower.
>

To verify whether above can lead to any kind of regression, I have
checked the cases (workload is 0.05 or 0.1 percent larger than shared
buffers) where we need few extra buffers and bgreclaimer might put
some additional buffers and it turns out that in those cases also, there
is a win especially at high concurrency and results of the same are posted
upthread
(
http://www.postgresql.org/message-id/CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=voolud...@mail.gmail.com).



> We should be able to tell these apart if there's enough instrumentation
> and solid logic inside of the program itself though.  The 8.3 era BGW coped
> with a lot of these issues using a particular style of moving average with
> fast reaction time, plus instrumenting the buffer allocation rate as
> accurately as it could. So before getting into high/low water note
> questions, are you comfortable that there's a clear, accurate number that
> measures the activity level that's important here?



Very Good Question.  This was exactly the thing which was
missing in my initial versions (about 2 years back when I tried to
solve this problem) but based on Robert's and Andres's feedback
I realized that we need an accurate number to measure the activity
level (in this case it is consumption of buffers from freelist), so
I have introduced the logic to calculate the same (it is stored in new
variable numFreeListBuffers in BufferStrategyControl structure).



> And have you considered ways it might be averaging over time or have a
> history that's analyzed?



The current logic of bgreclaimer is such that even if it does
some extra activity (extra is very much controlled) in one cycle,
it will not start another cycle unless backends consume all the
buffers that were made available by bgreclaimer in one cycle.
I think the algorithm designed for bgreclaimer automatically
averages out based on activity.  Do you see any cases where it
will not do so?


> The exact fast approach / slow decay weighted moving average approach of
> the 8.3 BGW, the thing that tried to smooth the erratic data set possible
> here, was a pretty critical part of getting itself auto-tuning to workload
> size.  It ended up being much more important than the work of setting the
> arbitrary watermark levels.
>
>
Agreed, but the logic with which bgwriter works is pretty different
and thats why it needs different kind of logic to handle auto-tuning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com