from:"Haribabu Kommi"

Re: [HACKERS] Multi-tenancy with RLS

2015-10-08 Thread Haribabu Kommi

On Fri, Oct 9, 2015 at 2:04 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> We've got one reloption for views already - security_barrier.  Maybe
>> we could have another one that effectively changes a particular view
>> from "security definer" as it is today to "security invoker".
>
> As I recall, there was a previous suggestion (honestly, I thought it was
> your idea) to have a reloption which made views "fully" security
> definer, in that functions in the view definition would run as the view
> owner instead of the view invoker.
>
> I liked that idea, though we would need to have a function to say "who
> is the 'outer' user?" (CURRENT_USER always being the owner with the
> above described reloption).
>
> I'm less sure about the idea of having a view which runs entirely as the
> view invoker, but I'm not against it either.

I changed in function check_enable_rls to use the invoker id instead of owner id
for all the system objects, the catalog table policies are getting
applied and it is
working fine till now in my multi-tenancy testing.

Currently I am writing tests to validate it against all user objects also.
If this change works for all user objects also, then we may not needed
the security invoker
reloption.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RLS bug in expanding security quals

2015-10-07 Thread Haribabu Kommi

During the testing of multi-tenancy feature from system catalog views, that
is described in [1], found a problem in executing "user_privileges" view
from information_schema. The following is the minimal test sql that
reproduces the problem.

SELECT (u_grantor.rolname) AS grantor,  (grantee.rolname) AS grantee
  FROM  pg_authid u_grantor,

  ( SELECT pg_authid.oid,

  pg_authid.rolname

 FROM pg_authid

  UNION ALL

  SELECT (0)::oid AS oid,

   'PUBLIC'::name) grantee(oid, rolname)
WHERE
(grantee.rolname = 'PUBLIC'::name)

>From further analysis, I found that the same issue can happen with user
tables also. Attached
rls_failure.sql file has test steps to reproduce the issue.

The problem is, while expanding security quals in
function expand_security_quals, the relation
u_grantor and test_tbl are to be expanded as they are the relations which
have security quals.

Following is the debug information of parse->rtable that shows the details
of each RangeTblEntry.

$69 = {type = T_Alias, aliasname = 0x19bd870 "u_grantor", colnames =
0x19bd890}
(gdb) p *((RangeTblEntry
*)parse->rtable->head->next->next->next->data.ptr_value)->eref
$70 = {type = T_Alias, aliasname = 0x19bffc8 "grantee", colnames =
0x19bffe0}
(gdb) p *((RangeTblEntry
*)parse->rtable->head->next->next->next->next->data.ptr_value)->eref
$71 = {type = T_Alias, aliasname = 0x19c3a60 "*SELECT* 1", colnames =
0x19c3a80}
(gdb) p *((RangeTblEntry
*)parse->rtable->head->next->next->next->next->next->data.ptr_value)->eref
$72 = {type = T_Alias, aliasname = 0x19c40d8 "*SELECT* 2", colnames =
0x19c40f8}
(gdb) p *((RangeTblEntry
*)parse->rtable->head->next->next->next->next->next->next->data.ptr_value)->eref
$73 = {type = T_Alias, aliasname = 0x19c4648 "test_tbl", colnames =
0x19c4668}


In expand_security_qual function, the security_barrier_replace_vars
function is called to prepare the context.targetlist. But this function
doesn't generate targetlist for test_tbl RangeTblEntry. Because of this
reason, while accessing the targetlist, it fails and throws an error.

In case if the policy is changed to below other than specified in the
rls_failure.sql

create policy test_tbl_policy on test_tbl for select using(true);

the query execution passes, because in expand_security_quals function,
the rangeTableEntry_used function returns false for test_tbl entry, thus it
avoids expanding the security qual.

Any ideas how to handle this problem?

Regards,
Hari Babu
Fujitsu Australia

[1] -
http://www.postgresql.org/message-id/cajrrpgd2cf4hz_edpx+uqjv1ytkajs_wjdiwj7pzzuuqwou...@mail.gmail.com


rls_failure.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS bug in expanding security quals

2015-10-07 Thread Haribabu Kommi

On Thu, Oct 8, 2015 at 2:54 PM, Stephen Frost <sfr...@snowman.net> wrote:
> Haribabu,
>
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
>> During the testing of multi-tenancy feature from system catalog views, that
>> is described in [1], found a problem in executing "user_privileges" view
>> from information_schema. The following is the minimal test sql that
>> reproduces the problem.
>
> Interesting, thanks.
>
>> >From further analysis, I found that the same issue can happen with user
>> tables also. Attached
>> rls_failure.sql file has test steps to reproduce the issue.
>
> Just to make sure we're on the same page, this results in this assertion
> being tripped:
>
> TRAP: FailedAssertion("!(var->varattno <= rel->max_attr)", File:
> "/home/sfrost/git/pg/dev/postgresql/src/backend/optimizer/path/costsize.c",
> Line: 4152)
>
> Due to var->varattno being 1 and rel->max_attr being 0.

Yes, the same the assertion problem with assert build.

without assert build, query fails with the following error.

ERROR:  invalid attnum -2 for rangetable entry test_tbl


>> Any ideas how to handle this problem?
>
> It's quite late here, but I'll take a look at this in more depth
> tomorrow.
>
> Based on what the Assert's testing, I took an educated guess and tried
> running without the UNION ALL, which appeared to work correctly.

Yes, it works fine without UNION ALL.

And also if we change the table column datatype from name to char,
the "pull_up_subqueries" function doesn't pull the union all because of
datatype mismatch and it works fine even with row level security is enabled.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-10-06 Thread Haribabu Kommi

On Tue, Oct 6, 2015 at 10:56 AM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
> Here I attached an updated version of the patch with the following changes.

I found some problems related to providing multi-tenancy on a system
catalog view.
This is because, system catalog view uses the owner that is created
the user instead
of the current user by storing the user information in "checkAsUser"
field in RangeTblEntry
structure.

The same "checkAsUser" is used in check_enable_rls function before
getting the policies for the table. All the system catalog views are
created by the super user, so no row level security policies
are applied to the views.

Ex-
SET SESSION ROLE tenancy_user1;

select relname from pg_class where relname = 'tenancy_user2_tbl1';
 relname
-
(0 rows)

select schemaname, relname from pg_stat_all_tables where relname =
'tenancy_user2_tbl1';
 schemaname |  relname
+
 public | tenancy_user2_tbl1
(1 row)

The policy that is created on pg_class system catalog table is, get
all the objects that current
user have permissions. Permissions can be anything.

To fix the problem, I thought of using current session id instead of
"checkAsUser" while applying
row level security policies to system catalog objects. This doesn't
affect the normal objects. But this solution has given some problems
for foreign_data.sql while running the regress tests as the planner is
generating targetlist as NULL.

Is the above specified solution is the correct approach to handle this
problem? If it is i will check the foreign_data.sql problem, otherwise
is there any good approach to handle the same?


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-10-06 Thread Haribabu Kommi

On Tue, Oct 6, 2015 at 10:29 PM, Stephen Frost <sfr...@snowman.net> wrote:
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
>> On Tue, Oct 6, 2015 at 10:56 AM, Haribabu Kommi
>> <kommi.harib...@gmail.com> wrote:
>> > Here I attached an updated version of the patch with the following changes.
>>
>> I found some problems related to providing multi-tenancy on a system
>> catalog view.
>> This is because, system catalog view uses the owner that is created
>> the user instead
>> of the current user by storing the user information in "checkAsUser"
>> field in RangeTblEntry
>> structure.
>
> Right, when querying through a view to tables underneath, we use the
> permissions of the view owner.  View creators should be generally aware
> of this already.
>
> I agree that it adds complications to the multi-tenancy idea since the
> system views, today, allow viewing of all objects.  There are two ways
> to address that:
>
> Modify the system catalog views to include the same constraints that the
> policies on the tables do
>
> or
>
> Allow RLS policies against views and then create the necessary policies
> on the views in the catalog.
>
> My inclination is to work towards the latter as that's a capability we'd
> like to have anyway.

Thanks for the solutions to handle the problem.

Currently I thought of providing two multi-tenancy solutions to the user.
They are:

1. Tenancy at shared system catalog tables level
2. Tenancy at database system catalog tables.

User can create views on system catalog tables, even though I want to provide
tenancy on those views also. I will do further analysis and provide
details of which
solution gives the benefit of two tenancy levels and then I can proceed for
implementation after discussion.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-10-05 Thread Haribabu Kommi

On Fri, Sep 11, 2015 at 7:50 AM, Joe Conway <m...@joeconway.com> wrote:
> On 09/01/2015 11:25 PM, Haribabu Kommi wrote:
>> If any user is granted any permissions on that object then that user
>> can view it's meta data of that object from the catalog tables.
>> To check the permissions of the user on the object, instead of
>> checking each and every available option, I just added a new
>> privilege check option called "any". If user have any permissions on
>> the object, the corresponding permission check function returns
>> true. Patch attached for the same.
>>
>> Any thoughts/comments?
>
> Thanks for working on this! Overall I like the concept and know of use
> cases where it is critical and should be supported. Some comments:

Here I attached an updated version of the patch with the following changes.

Two options to the user to create catalog security on system catalog tables.

./initdb -C -D data

With the above option during initdb, the catalog security is enabled
on all shared system catalog
tables. With this way the user can achieve the catalog security at
database level. For some users
this may be enough. Currently enabling catalog security is supported
only at initdb.

ALTER DATABASE  WITH CATALOG SECURITY=true;
ALTER DATABASE  WITH CATALOG SECURITY=false;

With the above commands, user can enable/disable catalog security on a
database system catalog
tables if multi-tenancy requires at table level.

Currently setting catalog security at create database command is not
supported. And also with
alter database command to enable/disable to database where the backend
is not connected.
This is because of a restriction to execute the policy commands
without connecting to a database.

Pending things needs to be taken care:

1. select * from tenancy_user1_tbl1;
ERROR:  permission denied for relation tenancy_user1_tbl1

As we are not able to see the above user table in any catalog relation
because of the multi-tenancy policies,
but if user tries to select the data of the table directly, The error
message comes as permission denied, I feel
instead of the permission denied error, in case of multi-tenancy is
enabled, the error message should be
"relation doesn't exist".

2. Correct all catalog relation policies
3. Add regression tests for all system catalog relations and views.
4. Documentation changes

Any comments?

Regards,
Hari Babu
Fujitsu Australia

multi-tenancy_with_rls_poc_3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-09-28 Thread Haribabu Kommi

On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janes  wrote:
> On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs  wrote:
>>
>> On 22 July 2015 at 17:11, Jeff Janes  wrote:
>>>
>>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
>>> wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes 
 wrote:
 > Attached is a patch that implements the vm scan for truncation.  It
 > introduces a variable to hold the last blkno which was skipped during
 > the
 > forward portion.  Any blocks after both this blkno and after the last
 > inspected nonempty page (which the code is already tracking) must have
 > been
 > observed to be empty by the current vacuum.  Any other process
 > rendering the
 > page nonempty are required to clear the vm bit, and no other process
 > can set
 > the bit again during the vacuum's lifetime.  So if the bit is still
 > set, the
 > page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.
>>>
>>>
>>> I wouldn't say forever, as it would be easy to revert the change if
>>> something more important came along that conflicted with it.
>>
>>
>> I think what is being said here is that someone is already using this
>> technique, or if not, then we actively want to encourage them to do so as an
>> extension or as a submission to core.
>>
>> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
>> Jeff. Probably needs code comments added.
>
>
> Sure, that sounds like the consensus.  The VM method was very efficient, but
> I agree it is pretty fragile and restricting.
>
>>
>>
>> That does still leave the prefetch technique, so all is not lost.
>>
>> Can we see a patch with just prefetch, probably with a simple choice of
>> stride? Thanks.
>
>
> I probably won't get back to it this commit fest, so it can be set to
> returned with feedback.  But if anyone has good ideas for how to set the
> stride (or detect that it is on SSD and so is pointless to try) I'd love to
> hear about them anytime.

I got the following way to get the whether data file is present in the
DISK or SSD.

1. Get the device file system that table data file is mapped using the
following or similar.

df -P "filename" | awk 'NR==2{print $1}'

2. if the device file system is of type /dev/sd* then treat is as a
disk system and proceed
with the prefetch optimization.

3. if we are not able to find the device details directly then we need
to get the information
from the mapping system.

Usually the devices will map like the following

/dev/mapper/v** points to ../dm-*

4. Get the name of the "dm-*"  from the above details and check
whether it is a SSD or not
with the following command.

/sys/block/dm-*/queue/rotation

5. If the value is 0 then it is an SSD drive, 1 means disk drive.

The described above procedure works only for linux. I didn't check for
other operating systems yet.
Is it worth to consider?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-09-18 Thread Haribabu Kommi

On Fri, Sep 18, 2015 at 9:45 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, Sep 18, 2015 at 1:33 PM, Haribabu Kommi <kommi.harib...@gmail.com>
> wrote:
>>
>> On Thu, Sep 3, 2015 at 8:21 PM, Amit Kapila <amit.kapil...@gmail.com>
>> wrote:
>> >
>> > Attached, find the rebased version of patch.
>>
>> Here are the performance test results:
>>
>> Query  selectivityHashAgg HashAgg
>> (million) + seqscan(ms)+
>> parallel seq scan(ms)
>> 2
>> workers   4 workers  8 workers
>> $1 <= '001'  0.1   16717.00 7086.00
>> 4459.00 2912.00
>> $1 <= '004'  0.4   17962.00 7410.00
>> 4651.00 2977.00
>> $1 <= '008'  0.8   18870.00 7849.00
>> 4868.00 3092.00
>> $1 <= '016'  1.5   21368.00 8645.00
>> 6800.00 3486.00
>> $1 <= '030'  2.7   24622.00   14796.0013108.00
>> 9981.00
>> $1 <= '060'  5.4   31690.00   29839.0026544.00
>>   23814.00
>> $1 <= '080'  7.2   37147.00   40485.0035763.00
>>   32679.00
>>
>
> I think here probably when the selectivity is more than 5, then it should
> not have selected Funnel plan.  Have you by any chance changed
> cpu_tuple_comm_cost?  If not, then you can try by setting value of
> parallel_setup_cost (may be 10) and then see if it selects the Funnel
> Plan.  Is it possible for you to check the cost difference of Sequence
> and Funnel plan, hopefully explain or explain analyze should be sufficient?

Yes, I changed cpu_tuple_comm_cost to zero to observe how parallel seq scan
performs in high selectivity. Forgot to mention in the earlier mail.
Overall the
parallel seq scan performance is good.


>> And also attached perf results for selectivity of 0.1 million and 5.4
>> million cases for analysis.
>>
>
> I have checked perf reports and it seems that when selectivity is more, it
> seems to be spending time in some kernel calls which could be due
> communication of tuples.

Yes. And also in low selectivity with increase of workers, tas and
s_lock functions usage
is getting increased. May be these are also one of the reasons for
scaling problem.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-09-16 Thread Haribabu Kommi

On Thu, Sep 17, 2015 at 6:10 AM, Robert Haas  wrote:
> On Thu, Sep 10, 2015 at 12:12 AM, Amit Kapila  wrote:
>>> 2. I think it's probably a good idea - at least for now, and maybe
>>> forever - to avoid nesting parallel plans inside of other parallel
>>> plans.  It's hard to imagine that being a win in a case like this, and
>>> it certainly adds a lot more cases to think about.
>>
>> I also think that avoiding nested parallel plans is a good step forward.
>
> Doing that as a part of the assess parallel safety patch was trivial, so I 
> did.
>

I tried with latest HEAD code, seems to be problem is present in other
scenarios.

postgres=# explain select * from tbl a where exists (select 1 from tbl
b where a.f1=b.f1 limit 0);
  QUERY PLAN
--
 Funnel on tbl a  (cost=0.00..397728310227.27 rows=500 width=214)
   Filter: (SubPlan 1)
   Number of Workers: 10
   ->  Partial Seq Scan on tbl a  (cost=0.00..397727310227.27
rows=500 width=214)
 Filter: (SubPlan 1)
 SubPlan 1
   ->  Limit  (cost=0.00..437500.00 rows=1 width=0)
 ->  Seq Scan on tbl b  (cost=0.00..437500.00 rows=1 width=0)
   Filter: (a.f1 = f1)
   SubPlan 1
 ->  Limit  (cost=0.00..437500.00 rows=1 width=0)
   ->  Seq Scan on tbl b  (cost=0.00..437500.00 rows=1 width=0)
 Filter: (a.f1 = f1)
(13 rows)

postgres=# explain analyze select * from tbl a where exists (select 1
from tbl b where a.f1=b.f1 limit 0);
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
LOG:  worker process: parallel worker for PID 8775 (PID 9121) exited
with exit code 1
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
LOG:  worker process: parallel worker for PID 8775 (PID 9116) exited
with exit code 1
LOG:  worker process: parallel worker for PID 8775 (PID 9119) exited
with exit code 1
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
LOG:  worker process: parallel worker for PID 8775 (PID 9117) exited
with exit code 1
LOG:  worker process: parallel worker for PID 8775 (PID 9114) exited
with exit code 1
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
LOG:  worker process: parallel worker for PID 8775 (PID 9118) exited
with exit code 1
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
CONTEXT:  parallel worker, pid 9115
STATEMENT:  explain analyze select * from tbl a where exists (select 1
from tbl b where a.f1=b.f1 limit 0);
LOG:  worker process: parallel worker for PID 8775 (PID 9115) exited
with exit code 1
LOG:  worker process: parallel worker for PID 8775 (PID 9120) exited
with exit code 1
ERROR:  badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"...
CONTEXT:  parallel worker, pid 9115

Regards,
Hari Babu
Fujitsu Australia

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-09-16 Thread Haribabu Kommi

On Thu, Sep 17, 2015 at 12:03 PM, Amit Kapila  wrote:
> As mentioned previously [1], we have to do two different things to make
> this work, Robert seems to have taken care of one of those (basically
> second point in mail[1]) and still another one needs to be taken care
> which is to provide support of reading subplans in readfuncs.c and that
> will solve the problem you are seeing now.

Thanks for the information.
During my test, I saw a plan change from parallel seq scan to seq scan
for the first reported query.
So I thought that all scenarios are corrected as not to generate the
parallel seq scan.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-09-14 Thread Haribabu Kommi

On Thu, Sep 10, 2015 at 2:12 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Sep 10, 2015 at 4:16 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>
>> On Wed, Sep 9, 2015 at 11:07 AM, Amit Kapila <amit.kapil...@gmail.com>
>> wrote:
>> > On Wed, Sep 9, 2015 at 11:47 AM, Haribabu Kommi
>> > <kommi.harib...@gmail.com>
>> > wrote:
>> >> With subquery, parallel scan is having some problem, please refer
>> >> below.
>> >>
>> >> postgres=# explain analyze select * from test01 where kinkocord not in
>> >> (select kinkocord from test02 where tenpocord = '001');
>> >> ERROR:  badly formatted node string "SUBPLAN :subLinkType 2
>> >> :testexpr"...
>> >> CONTEXT:  parallel worker, pid 32879
>> >> postgres=#
>> >
>> > The problem here is that readfuncs.c doesn't have support for reading
>> > SubPlan nodes. I have added support for some of the nodes, but it seems
>> > SubPlan node also needs to be added.  Now I think this is okay if the
>> > SubPlan
>> > is any node other than Funnel, but if Subplan contains Funnel, then each
>> > worker needs to spawn other workers to execute the Subplan which I am
>> > not sure is the best way.  Another possibility could be store the
>> > results of
>> > Subplan in some tuplestore or some other way and then pass those to
>> > workers
>> > which again doesn't sound to be promising way considering we might have
>> > hashed SubPlan for which we need to build a hashtable.  Yet another way
>> > could be for such cases execute the Filter in master node only.
>>
>> IIUC, there are two separate issues here:
>>
>
> Yes.
>
>> 1. We need to have readfuncs support for all the right plan nodes.
>> Maybe we should just bite the bullet and add readfuncs support for all
>> plan nodes.  But if not, we can add support for whatever we need.
>>
>> 2. I think it's probably a good idea - at least for now, and maybe
>> forever - to avoid nesting parallel plans inside of other parallel
>> plans.  It's hard to imagine that being a win in a case like this, and
>> it certainly adds a lot more cases to think about.
>>
>
> I also think that avoiding nested parallel plans is a good step forward.
>

I reviewed the parallel_seqscan_funnel_v17.patch and following are my comments.
I will continue my review with the parallel_seqscan_partialseqscan_v17.patch.

+ if (inst_options)
+ {
+ instoptions = shm_toc_lookup(toc, PARALLEL_KEY_INST_OPTIONS);
+ *inst_options = *instoptions;
+ if (inst_options)

Same pointer variable check, it should be if (*inst_options) as per the
estimate and store functions.


+ if (funnelstate->ss.ps.ps_ProjInfo)
+ slot = funnelstate->ss.ps.ps_ProjInfo->pi_slot;
+ else
+ slot = funnelstate->ss.ss_ScanTupleSlot;

Currently, there will not be a projinfo for funnel node. So always it uses
the scan tuple slot. In case if it is different, we need to add the ExecProject
call in ExecFunnel function. Currently it is not present, either we can document
it or add the function call.


+ if (!((*dest->receiveSlot) (slot, dest)))
+ break;

and

+void
+TupleQueueFunnelShutdown(TupleQueueFunnel *funnel)
+{
+ if (funnel)
+ {
+ int i;
+ shm_mq_handle *mqh;
+ shm_mq   *mq;
+ for (i = 0; i < funnel->nqueues; i++)
+ {
+ mqh = funnel->queue[i];
+ mq = shm_mq_get_queue(mqh);
+ shm_mq_detach(mq);
+ }
+ }
+}


Using this function, the backend detaches from the message queue, so
that the workers
which are trying to put results into the queues gets an error message
as SHM_MQ_DETACHED.
Then worker finshes the execution of the plan. For this reason all the
printtup return
types are changed from void to bool.

But this way the worker doesn't get exited until it tries to put a
tuple in the queue.
If there are no valid tuples that satisfy the condition, then it may
take time for the workers
to exit. Am I correct? I am not sure how frequent such scenarios can occur.


+ if (parallel_seqscan_degree >= MaxConnections)
+ {
+ write_stderr("%s: parallel_scan_degree must be less than
max_connections\n", progname);
+ ExitPostmaster(1);
+ }

The error condition works only during server start. User still can set
parallel seqscan degree
more than max connection at super user session level and etc.


+ if (!parallelstmt->inst_options)
+ (*receiver->rDestroy) (receiver);

Why only when there is no instruementation only, the receiver needs to
be destroyed?


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-09-14 Thread Haribabu Kommi

On Fri, Sep 11, 2015 at 7:50 AM, Joe Conway <m...@joeconway.com> wrote:
> On 09/01/2015 11:25 PM, Haribabu Kommi wrote:
>> If any user is granted any permissions on that object then that user
>> can view it's meta data of that object from the catalog tables.
>> To check the permissions of the user on the object, instead of
>> checking each and every available option, I just added a new
>> privilege check option called "any". If user have any permissions on
>> the object, the corresponding permission check function returns
>> true. Patch attached for the same.
>>
>> Any thoughts/comments?
>
> Thanks for working on this! Overall I like the concept and know of use
> cases where it is critical and should be supported. Some comments:

Thanks for the review, I will take care of the comments in the next patch.

I didn't find any better approach other than creating policies automatically
or providing permission to superuser on system catalog tables. If everyone
feels as this is the best approach, then i will create policies for all catalog
tables in the next version.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-09-09 Thread Haribabu Kommi

On Thu, Sep 3, 2015 at 8:21 PM, Amit Kapila  wrote:
> On Thu, Jul 23, 2015 at 7:43 PM, Kouhei Kaigai  wrote:
>>
>> Hi Amit,
>>
>> The latest v16 patch cannot be applied to the latest
>> master as is.
>> 434873806a9b1c0edd53c2a9df7c93a8ba021147 changed various
>> lines in heapam.c, so it probably conflicts with this.
>>
>
> Attached, find the rebased version of patch.  It fixes the comments raised
> by Jeff  Davis and Antonin Houska.  The main changes in this version are
> now it supports sync scan along with parallel sequential scan (refer
> heapam.c)
> and the patch has been split into two parts, first contains the code for
> Funnel node and infrastructure to support the same and second contains
> the code for PartialSeqScan node  and its infrastructure.
>

Thanks for the updated patch.

With subquery, parallel scan is having some problem, please refer below.

postgres=# explain select * from test01 where kinkocord not in (select
kinkocord from test02 where tenpocord = '001');
QUERY PLAN
--
 Funnel on test01  (cost=0.00..155114352184.12 rows=2008 width=435)
   Filter: (NOT (SubPlan 1))
   Number of Workers: 16
   ->  Partial Seq Scan on test01  (cost=0.00..155114352184.12
rows=2008 width=435)
 Filter: (NOT (SubPlan 1))
 SubPlan 1
   ->  Materialize  (cost=0.00..130883.67 rows=385333 width=5)
 ->  Funnel on test02  (cost=0.00..127451.01
rows=385333 width=5)
   Filter: (tenpocord = '001'::bpchar)
   Number of Workers: 16
   ->  Partial Seq Scan on test02
(cost=0.00..127451.01 rows=385333 width=5)
 Filter: (tenpocord = '001'::bpchar)
   SubPlan 1
 ->  Materialize  (cost=0.00..130883.67 rows=385333 width=5)
   ->  Funnel on test02  (cost=0.00..127451.01 rows=385333 width=5)
 Filter: (tenpocord = '001'::bpchar)
 Number of Workers: 16
 ->  Partial Seq Scan on test02  (cost=0.00..127451.01
rows=385333 width=5)
   Filter: (tenpocord = '001'::bpchar)
(19 rows)

postgres=# explain analyze select * from test01 where kinkocord not in
(select kinkocord from test02 where tenpocord = '001');
ERROR:  badly formatted node string "SUBPLAN :subLinkType 2 :testexpr"...
CONTEXT:  parallel worker, pid 32879
postgres=#


And also regarding the number of workers (16) that is shown in the
explain analyze plan are not actually allotted because the in my
configuration i set the max_worker_process as 8 only. I feel the plan
should show the allotted workers not the planned workers.
If the query execution takes time because of lack of workers and the
plan is showing as 16 workers, in that case user may think that
even with 16 workers the query is slower, but actually it is not.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-09-07 Thread Haribabu Kommi

On Mon, Sep 7, 2015 at 4:34 AM, Pavel Stehule  wrote:
> Hi
>
>
>>
>> postgres=# select pg_hba_lookup('postgres','all');
>>  pg_hba_lookup
>> ---
>>  (84,local,"[""all""]","[""all""]",,,trust,{})
>>  (86,host,"[""all""]","[""all""]",127.0.0.1,,trust,{})
>>  (88,host,"[""all""]","[""all""]",::1,,trust,{})
>>
>> Here I attached a proof of concept patch for the same.
>>
>> Any suggestions/comments on this proposed approach?
>>
>
> If I understand well to your proposal, the major benefit is in impossibility
> to enter pg_hba keywords - so you don't need to analyse if parameter is
> keyword or not? It has sense, although It can be hard to do image about
> pg_hba conf from these partial views.

>From the function output, it is little bit difficult to map the
pg_hba.conf file.
Because of problems in processing keywords in where clause of a view, I changed
from view to function.

Is there any possibility with rule or something, that the where clause
details can be passed
as function arguments to get the data?

> There can be other way - theoretically we can have a function pg_hba_debug
> with similar API like pg_hba_conf. The result will be a content of
> pg_hba.conf with information about result of any rule.


The output of pg_hba_debug function looks like, the entry of
pg_hba.conf and the result
match for the given input data.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-09-02 Thread Haribabu Kommi

On Fri, Aug 14, 2015 at 12:00 PM, Haribabu Kommi
<kommi.harib...@gmail.com> wrote:
>
> Here I attached the proof concept patch.

Here I attached an updated patch by adding policies to the most of the
system catalog tables, except the following.
AggregateRelationId

AccessMethodRelationId
AccessMethodOperatorRelationId
AccessMethodProcedureRelationId

AuthMemRelationId
CastRelationId
EnumRelationId
EventTriggerRelationId
ExtensionRelationId

LargeObjectRelationId
LargeObjectMetadataRelationId

PLTemplateRelationId
RangeRelationId
RewriteRelationId
TransformRelationId

TSConfigRelationId
TSConfigMapRelationId
TSDictionaryRelationId
TSParserRelationId
TSTemplateRelationId

Following catalog tables needs to create the policy based on the
class, so currently didn't added any policy for the same.

SecLabelRelationId
SharedDependRelationId
SharedDescriptionRelationId
SharedSecLabelRelationId

If any user is granted any permissions on that object then that user
can view it's meta data of that object from the catalog tables.
To check the permissions of the user on the object, instead of
checking each and every available option, I just added a new
privilege check option called "any". If user have any permissions on
the object, the corresponding permission check function returns
true. Patch attached for the same.

Any thoughts/comments?

Regards,
Hari Babu
Fujitsu Australia

multi-tenancy_with_rls_poc_2.patch
Description: Binary data

any_privilege_check_option.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multi-tenancy with RLS

2015-08-13 Thread Haribabu Kommi

This is regarding supporting of multi-tenancy in a single PostgreSQL instance
using the row level security feature. The main idea is to have the
row level security
enabled on system catalog tables, thus the user can get only the rows that are
either system objects or the user objects, where the user is the owner.


Example:

postgres=# create role test login;
postgres=# create role test1 login;

postgres=# \c postgres test
postgres= create table test(f1 int);
postgres= \d
   List of relations
 Schema | Name | Type  | Owner
+--+---+---
 public | test | table | test
(1 row)

postgres= \c postgres test1
postgres= create table test1(f1 int);
postgres= \d
   List of relations
 Schema | Name  | Type  | Owner
+---+---+---
 public | test1 | table | test1
(1 row)

postgres=# \c postgres test
postgres= \d
   List of relations
 Schema | Name | Type  | Owner
+--+---+---
 public | test | table | test
(1 row)


To enable row level security on system catalog tables, currently I
added a new database option to create/alter database. The syntax can
be changed later. Adding an option to database makes it easier for
users to enable/disable the row level security on system catalog
tables.

CREATE DATABASE USERDB WITH ROW LEVEL SECURITY = TRUE;
ALTER DATBASE USERDB WITH ROW LEVEL SECURITY = FALSE;

A new boolean column datrowlevelsecurity is added to pg_database
system catalog table to display the status of row level security on
that database.

Currently I just implemented the row level security is enabled only
for pg_class system table as a proof of concept. whenever the row
level security on the database is enabled/disabled, it internally
fires the create policy/remove policy commands using SPI interfaces.

Here I attached the proof concept patch.

Pending items:
1. Supporting of RLS on all system catalog tables
2. Instead of SPI interfaces, any better way to create/remove policies.

Any comments/suggestions regarding the way to achieve multi-tenancy in
PostgreSQL?

Regards,
Hari Babu
Fujitsu Australia


multi-tenancy_with_rls_poc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Priority table or Cache table

2015-08-11 Thread Haribabu Kommi

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 What is the configuration for test (RAM of m/c, shared_buffers,
 scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

  Threads HeadPatchedDiff
  1  3123  3238  3.68%
  2  5997  6261  4.40%
  4 11102   11407  2.75%

 I am suspecting that, this may because of buffer locks that are
 causing the problem.
 where as in older approach of different buffer pools, each buffer pool
 have it's own locks.
 I will try to collect the profile output and analyze the same.

 Any better ideas?


 I think you should try to find out during test, for how many many pages,
 it needs to perform clocksweep (add some new counter like
 numBufferBackendClocksweep in BufferStrategyControl to find out the
 same).  By theory your patch should reduce the number of times it needs
 to perform clock sweep.

 I think in this approach even if you make some buffers as non-replaceable
 (buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
 needs to access all the buffers.  I think we might want to find some way to
 reduce that if this idea helps.

 Another thing is that, this idea looks somewhat similar (although not same)
 to current Ring Buffer concept, where Buffers for particular types of scan
 uses buffers from Ring.  I think it is okay to prototype as you have done
 in patch and we can consider to do something on those lines if at all
 this patch's idea helps.

Thanks for the details. I will try the same.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Priority table or Cache table

2015-08-11 Thread Haribabu Kommi

On Tue, Aug 11, 2015 at 4:43 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi
  kommi.harib...@gmail.com
  wrote:
 
  What is the configuration for test (RAM of m/c, shared_buffers,
  scale_factor, etc.)?

 Here are the details:

 CPU - 16 core, RAM - 252 GB

 shared_buffers - 1700MB, buffer_cache_ratio - 70
 wal_buffers - 16MB, synchronous_commit - off
 checkpoint_timeout - 15min, max_wal_size - 5GB.

 pgbench scale factor - 75 (1GB)

 Load test table size - 1GB


 It seems that test table can fit easily in shared buffers, I am not sure
 this patch will be of benefit for such cases, why do you think it can be
 beneficial for such cases?

Yes. This configuration combination is may not be best for the test.

The idea behind these setting is to provide enough shared buffers to cache
table by tuning the buffer_cache_ratio from 0 to 70% of shared buffers
So the cache tables have enough shared buffers and rest of the shared
buffers can be used for normal tables i.e load test table.

I will try to evaluate some more performance tests with different shared
buffers settings and load.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Priority table or Cache table

2015-08-06 Thread Haribabu Kommi

On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson memissemer...@gmail.com wrote:

 I also ran the test script after making the same configuration changes that
 you have specified. I found that I was not able to get the same performance
 difference that you have reported.

 Following table lists the tps in each scenario and the % increase in
 performance.

 Threads Head PatchedDiff
 1  1669  1718  3%
 2  2844  3195  12%
 4  3909  4915  26%
 8  7332  8329 14%



coming back to this old thread.

I just tried a new approach for this priority table, instead of a
entirely separate buffer pool,
Just try to use a some portion of shared buffers to priority tables
using some GUC variable
buffer_cache_ratio(0-75) to specify what percentage of shared
buffers to be used.

Syntax:

create table tbl(f1 int) with(buffer_cache=true);

Comparing earlier approach, I though of this approach is easier to implement.
But during the performance run, it didn't showed much improvement in
performance.
Here are the test results.

 Threads HeadPatchedDiff
 1  3123  3238  3.68%
 2  5997  6261  4.40%
 4 11102   11407  2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

Here I attached a proof of concept patch.

Regards,
Hari Babu
Fujitsu Australia


cache_table_poc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Caching offsets of varlena attributes

2015-08-05 Thread Haribabu Kommi

On Thu, Aug 6, 2015 at 4:09 AM, Vignesh Raghunathan
vignesh.pg...@gmail.com wrote:
 Hello,

 In the function heap_deform_tuple, there is a comment before caching varlena
 attributes specifying that the offset will be valid for either an aligned or
 unaligned value if there are no padding bytes. Could someone please
 elaborate on this?

In PostgreSQL tuple can contain two types of varlena headers. Those are
short varlena(doesn't need any alignment) and 4-byte varlena(needs alignment).

Refer the function heap_fill_tuple to see how the tuple is constructed.
For short varlena headers, even if the alignment suggests to do the alignment,
we shouldn't not do. Because of this reason instead of att_align_nominal, the
att_align_pointer is called.

The following is the comment from att_align_pointer macro which gives the
details why we should use this macro instead of att_align_nominal.

 * (A zero byte must be either a pad byte, or the first byte of a correctly
 * aligned 4-byte length word; in either case we can align safely.  A non-zero
 * byte must be either a 1-byte length word, or the first byte of a correctly
 * aligned 4-byte length word; in either case we need not align.)

 Also, why is it safe to call att_align_nominal if the attribute is not
 varlena? Couldn't att_align_pointer be called for both cases? I am not able
 to understand how att_align_nominal is faster.

All other types definitely needs either char or int or double alignment. Because
of this reason it is safe to use the att_align_nominal macro. Please refer the
function heap_fill_tuple for more details.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread Haribabu Kommi

On Thu, Jul 9, 2015 at 7:44 PM, David Rowley
david.row...@2ndquadrant.com wrote:
 On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote:


 This basically allows an aggregate's state to be shared between other
 aggregate functions when both aggregate's transition functions (and a few
 other things) match
 There's quite a number of aggregates in our standard set which will
 benefit from this optimisation.


 After compiling the original patch with another compiler, I noticed a couple
 of warnings.

 The attached fixes these.

I did some performance tests on the patch. This patch shown good
improvement for same column aggregates. With int or bigint datatype columns,
this patch doesn't show any visible performance difference. But with numeric
datatype it shows good improvement.

select sum(x), avg(y) from test where x  $1;

Different columns:

selectivityHeadpatch
(millions)
0.1   315  322
0.3   367  376
0.5   419  427
1  551  558
2  824  826

select sum(x), avg(x) from test where x  $1;

Same column:

selectivityHeadpatch
(millions)
0.1   314  314
0.3   363  343
0.5   412  373
1  536  440
2  795  586

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-26 Thread Haribabu Kommi

On Thu, Jul 23, 2015 at 9:42 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Jul 22, 2015 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote:

 One thing I noticed that is a bit dismaying is that we don't get a lot
 of benefit from having more workers.  Look at the 0.1 data.  At 2
 workers, if we scaled perfectly, we would be 3x faster (since the
 master can do work too), but we are actually 2.4x faster.  Each
 process is on the average 80% efficient.  That's respectable.  At 4
 workers, we would be 5x faster with perfect scaling; here we are 3.5x
 faster.   So the third and fourth worker were about 50% efficient.
 Hmm, not as good.  But then going up to 8 workers bought us basically
 nothing.


 I think the improvement also depends on how costly is the qualification,
 if it is costly, even for same selectivity the gains will be shown till
 higher
 number of clients and for simple qualifications, we will see that cost of
 having more workers will start dominating (processing data over multiple
 tuple queues) over the benefit we can achieve by them.

Yes, That's correct. when the qualification cost is increased, the performance
is also increasing with number of workers.

Instead of using all the configured workers per query, how about deciding number
of workers based on cost of the qualification? I am not sure whether we have
any information available to find out the qualification cost. This way
the workers
will be distributed to all backends properly.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-07-21 Thread Haribabu Kommi

Hi Hackers,

This is the patch adds a new function called pg_hba_lookup to get all
matching entries
from the pg_hba.conf for the providing input data.The rows are
displayed from the other
the hba conf entries are matched.

This is an updated version of previous failure attempt to create a
catalog view for the
pg_hba.conf [1]. The view is not able to handle the SQL queries properly because
keywords that are present in database and user columns.


currently the following two types are added:

pg_hba_lookup(database, user)
pg_hba_lookup(database, user, address, hostname)


How it works:

With the provided input data, it tries to match the entries of
pg_hba.conf and populate the
result set with all matching entries.

With the recent Tomlane's commit
1e24cf645d24aab3ea39a9d259897fd0cae4e4b6 of Don't leave pg_hba and
pg_ident data lying around in running backends [2], the parsed hba
conf entries are not available in the backend side. Temporarily I just
reverted this patch for the
proof of concept purpose. Once we agree with the approach, I will try
to find out a solution
to the same.


How is it useful:

With the output of this view, administrator can identify the lines
that are matching for the given
criteria easily without going through the file.


Record format:

Column name | datatype
---
line_number | int4
type  | text
database  | jsonb
user  | jsonb
address| inet
hostname | text
method | text
options  | jsonb

Please suggest me for any column additions or data type changes that
are required.


Example output:

postgres=# select pg_hba_lookup('postgres','all');
 pg_hba_lookup
---
 (84,local,[all],[all],,,trust,{})
 (86,host,[all],[all],127.0.0.1,,trust,{})
 (88,host,[all],[all],::1,,trust,{})

Here I attached a proof of concept patch for the same.

Any suggestions/comments on this proposed approach?

[1] 
http://www.postgresql.org/message-id/f40b0968db0a904da78a924e633be78645f...@sydexchtmp2.au.fjanz.com

[2] 
http://www.postgresql.org/message-id/e1zaquy-00072j...@gemulon.postgresql.org

Regards,
Hari Babu
Fujitsu Australia


pg_hba_lookup_poc.patch
Description: Binary data


revert_hba_context_release_in_backend.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-20 Thread Haribabu Kommi

On Mon, Jul 20, 2015 at 3:31 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Jul 17, 2015 at 1:22 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  Thanks, I will fix this in next version of patch.
 

 I am posting in this thread as I am not sure, whether it needs a
 separate thread or not?

 I gone through the code and found that the newly added funnel node is
 is tightly coupled with
 partial seq scan, in order to add many more parallel plans along with
 parallel seq scan,
 we need to remove the integration of this node with partial seq scan.


 This assumption is wrong, Funnel node can execute any node beneath
 it (Refer ExecFunnel-funnel_getnext-ExecProcNode, similarly you
 can see exec_parallel_stmt).

Yes, funnel node can execute any node beneath it. But during the planning
phase, the funnel path is added on top of partial scan path. I just want the
same to enhanced to support other parallel nodes.

 Yes, currently nodes supported under
 Funnel nodes are limited like partialseqscan, result (due to reasons
 mentioned upthread like readfuncs.s doesn't have support to read Plan
 nodes which is required for worker backend to read the PlannedStmt,
 ofcourse we can add them, but as we are supportting parallelism for
 limited nodes, so I have not enhanced the readfuncs.c) but in general
 the basic infrastructure is designed such a way that it can support
 other nodes beneath it.

 To achieve the same, I have the following ideas.


 Execution:
 The funnel execution varies based on the below plan node.
 1) partial scan - Funnel does the local scan also and returns the tuples
 2) partial agg - Funnel does the merging of aggregate results and
 returns the final result.


 Basically Funnel will execute any node beneath it, the Funnel node itself
 is not responsible for doing local scan or any form of consolidation of
 results, as of now, it has these 3 basic properties
 – Has one child, runs multiple copies in parallel.
 – Combines the results into a single tuple stream.
 – Can run the child itself if no workers available.

+ if (!funnelstate-local_scan_done)
+ {
+ outerPlan = outerPlanState(funnelstate);
+
+ outerTupleSlot = ExecProcNode(outerPlan);

From the above code in funnel_getnext function, it directly does the
calls the below
node to do the scan in the backend side also. This code should refer the below
node type, based on that only it can go for the backend scan.

I feel executing outer plan always may not be correct for other parallel nodes.

 Any other better ideas to achieve the same?


 Refer slides 16-19 in Parallel Sequential Scan presentation in PGCon
 https://www.pgcon.org/2015/schedule/events/785.en.html

Thanks for the information.

 I don't have very clear idea what is the best way to transform the nodes
 in optimizer, but I think we can figure that out later unless majority
 people
 see that as blocking factor.

I am also not finding it as a blocking factor for parallel scan.
I written the above mail to get some feedback/suggestions from hackers on
how to proceed in adding other parallelism nodes along with parallel scan.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-17 Thread Haribabu Kommi

On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Thanks, I will fix this in next version of patch.


I am posting in this thread as I am not sure, whether it needs a
separate thread or not?

I gone through the code and found that the newly added funnel node is
is tightly coupled with
partial seq scan, in order to add many more parallel plans along with
parallel seq scan,
we need to remove the integration of this node with partial seq scan.

To achieve the same, I have the following ideas.

Plan:
1) Add the funnel path immediately for every parallel path similar to
the current parallel seq scan,
 but during the plan generation generate the funnel plan only for the
top funnel path and
 ignore rest funnel paths.

2)Instead of adding a funnel path immediately after the partial seq
scan path is generated.
Add the funnel path in grouping_planner once the final rel path is
generated before creating the plan.

Execution:
The funnel execution varies based on the below plan node.
1) partial scan - Funnel does the local scan also and returns the tuples
2) partial agg - Funnel does the merging of aggregate results and
returns the final result.

Any other better ideas to achieve the same?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-16 Thread Haribabu Kommi

On Thu, Jul 16, 2015 at 10:17 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:


 Here I attached updated patches,
 1) without prefetch logic.
 2) with combined vm and prefetch logic.


 I think it is better to just get the first patch in as that in itself is a
 clear win and then we can evaluate the second one (prefetching during
 truncation) separately.  I think after first patch, the use case for doing
 prefetch seems to be less beneficial and I think it could hurt by loading
 not required pages (assume you have prefetched 32 pages and after
 inspecting first page, it decides to quit the loop as that is non-empty page
 and other is when it has prefetched just because for page the visibility map
 bit was cleared, but for others it is set) in OS buffer cache.

Yes, in the above cases, the prefetch is an overhead. I am not sure
how frequently such
scenarios occur in real time scenarios.

 Having said that, I am not against prefetching in this scenario as that
 can help in more cases then it could hurt, but I think it will be better
 to evaluate that separately.

Yes, the prefetch patch works better in cases, where majorly the first
vacuum skips
the truncation because of lock waiters. If such cases are more in real
time scenarios,
then the prefetch patch is needed.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-15 Thread Haribabu Kommi

On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.


 One case where this patch can behave differently then current
 code is, when before taking Exclusive lock on relation for truncation,
 if some backend clears the vm bit and then inserts-deletes-prune that
 page.  I think with patch it will not consider to truncate pages whereas
 current code will allow to truncate it as it re-verifies each page.  I think
 such a case would be rare and we might not want to bother about it,
 but still worth to think about it once.

Thanks for your review.

corrected the code as instead of returning the blkno after visibility map
check failure, the code just continue to verify the contents as
earlier approach.


 Some minor points about patch:

 count_nondeletable_pages()
 {
 ..
 if (PageIsNew(page) || PageIsEmpty(page))
 {
 /* PageIsNew probably shouldn't happen... */
 UnlockReleaseBuffer(buf);
 continue;
 }
 ..
 }

 Why vmbuffer is not freed in above loop?


In the above check, we are just continuing to the next blkno, at the
end of the loop the vmbuffer
is freed.


 count_nondeletable_pages()
 {
 ..
 + /*
 + * Pages that were inspected and found to be empty by this vacuum run
 + * must still be empty if the vm bit is still set.  Only vacuum sets
 + * vm bits, and only one vacuum can exist in a table at one time.
 + */
 + trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ?
 + vacrelstats-nonempty_pages : vacrelstats-skipped_pages;

 ..
 }

 I think it is better to have spaces in above assignment statement
 (near '=' and near '')


corrected.

Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.

I marked the patch as ready for committer.


Regards,
Hari Babu
Fujitsu Australia


vac_trunc_trust_vm_v2.patch
Description: Binary data


vac_trunc_trust_vm_and_prefetch_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-14 Thread Haribabu Kommi

On Mon, Jul 13, 2015 at 5:16 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

 Following are the performance readings on a machine with SSD.
 I increased the pgbench scale factor to 1000 in the test instead of 500
 to show a better performance numbers.

  Head   vm patchvm+prefetch patch

 First vacuum6.24 sec   2.91 sec   2.91 sec
 second vacuum6.66 sec   6.66 sec   7.19 sec

 There is a small performance impact on SSD with prefetch.

The above prefetch overhead is observed with prefeching of 1639345 pages.
I feel this overhead is small.

Hi Jeff,

If you are fine with earlier attached patch, then I will mark this patch as
ready for committer, to get some committer view on the patch.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-13 Thread Haribabu Kommi

On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.

 Head   vm patchvm+prefetch patch

First vacuum6.24 sec   2.91 sec   2.91 sec
second vacuum6.66 sec   6.66 sec   7.19 sec

There is a small performance impact on SSD with prefetch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-12 Thread Haribabu Kommi

On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com wrote:

 I will do some performance tests and send you the results.

Here are the performance results tested on my machine.


 Head  vm patchvm+prefetch patch

First vacuum120sec1sec 1sec
second vacuum180 sec   180 sec30 sec

I did some modifications in the code to skip the vacuum truncation by
the first vacuum command.
This way I collected the second vacuum time taken performance.

I just combined your vm and prefetch patch into a single patch
vm+prefetch patch without a GUC.
I kept the prefetch as 32 and did the performance test. I chosen
prefetch based on the current
buffer access strategy, which is 32 for vacuum presently instead of an
user option.
Here I attached the modified patch with both vm+prefetch logic.

I will do some tests on a machine with SSD and let you know the
results. Based on these results,
we can decide whether we need a GUC or not? based on the impact of
prefetch on ssd machines.

Regards,
Hari Babu
Fujitsu Australia


vac_trunc_trust_vm_and_prefetch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-09 Thread Haribabu Kommi

On Thu, Jul 9, 2015 at 4:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  There is still the case of pages which had their visibility bit set by a
  prior vacuum and then were not inspected by the current one.  Once the
  truncation scan runs into these pages, it falls back to the previous
  behavior of reading block by block backwards.  So there could still be
  reason to optimize that fallback using forward-reading prefetch.

 The case, I didn't understand is that, how the current vacuum misses
 the page which
 was set by the prior vacuum?


 The prior vacuum set them to all visible, but then doesn't delete them.
 Either because it got interrupted or because there were still some pages
 after them (at the time) that were not all empty.

 The current vacuum skips them entirely on the forward scan because it thinks
 it is waste of time to vacuum all visible pages.  Since it skips them, it
 doesn't know if they are empty as well as being all-visible.  There is no
 permanent indicator of the pages being all-empty, there is only the
 inference based on the current vacuum's counters and protected by the lock
 held on the table.



 The page should be counted either in skipped_pages or in
 nonempty_pages. Is it possible
 that a page doesn't comes under these two categories and it is not empty
 also?

 If the above doesn't exist, then we can directly truncate the relation
 from the highest block
 number of either nonempty_pages or skipped_pages to till the end of
 the relation.


 Right, and that is what this does (provided the vm bit is still set, so it
 does still have to loop over the vm to verify that it is still set, while it
 holds the access exclusive lock).

Thanks I got it. To re-verify the vm bit of a page after getting the
access exclusive lock,
we have to do the backward scan.

I also feel that your prefetch buffer patch needed to improve the
performance, as because
there is a need of backward scan to re-verify vm bit,

I will do some performance tests and send you the results.

 The problem is that the pages between the two counters are not known to be
 empty, and also not known to be nonempty.  Someone has to be willing to go
 check on those pages at some point, or else they will never be eligible for
 truncation.

Yes, there is no way to identify the page is empty or not without
reading the page.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Waits monitoring

2015-07-08 Thread Haribabu Kommi

On Thu, Jul 9, 2015 at 1:52 AM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:
 Hello.

 Currently, PostgreSQL offers many metrics for monitoring. However, detailed
 monitoring of waits is still not supported yet. Such monitoring would
 let dba know how long backend waited for particular event and therefore
 identify
 bottlenecks. This functionality is very useful, especially for highload
 databases. Metric for waits monitoring are provided by many popular
 commercial
 DBMS. We currently have requests of this feature from companies migrating to
 PostgreSQL from commercial DBMS. Thus, I think it would be nice for
 PostgreSQL
 to have it too.

Yes, It is good have such wait monitoring views in PostgreSQL.
you can add this patch to the open commitfest.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-08 Thread Haribabu Kommi

On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.

The patch looks good and it improves the vacuum truncation speed significantly.

 There is still the case of pages which had their visibility bit set by a
 prior vacuum and then were not inspected by the current one.  Once the
 truncation scan runs into these pages, it falls back to the previous
 behavior of reading block by block backwards.  So there could still be
 reason to optimize that fallback using forward-reading prefetch.

The case, I didn't understand is that, how the current vacuum misses
the page which
was set by the prior vacuum?

The page should be counted either in skipped_pages or in
nonempty_pages. Is it possible
that a page doesn't comes under these two categories and it is not empty also?

If the above doesn't exist, then we can directly truncate the relation
from the highest block
number of either nonempty_pages or skipped_pages to till the end of
the relation.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-06 Thread Haribabu Kommi

On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Attached, find the rebased version of patch.

 Note - You need to first apply the assess-parallel-safety patch which you
 can find at:
 http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com

I ran some performance tests on a 16 core machine with large shared
buffers, so there is no IO involved.
With the default value of cpu_tuple_comm_cost, parallel plan is not
getting generated even if we are selecting 100K records from 40
million records. So I changed the value to '0' and collected the
performance readings.

Here are the performance numbers:

selectivity(millions)  Seq scan(ms)  Parallel scan
 2 workers
4 workers 8 workers
  0.1  11498.934821.40
3305.843291.90
  0.4  10942.984967.46
3338.583374.00
  0.8  11619.445189.61
3543.863534.40
  1.5  12585.515718.07
4162.712994.90
  2.7  14725.668346.96
10429.058049.11
  5.4  18719.00  20212.33 21815.19
 19026.99
  7.2  21955.79  28570.74 28217.60
 27042.27

The average table row size is around 500 bytes and query selection
column width is around 36 bytes.
when the query selectivity goes more than 10% of total table records,
the parallel scan performance is dropping.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-vacuum is not running in 9.1.12

2015-06-17 Thread Haribabu Kommi

On Wed, Jun 17, 2015 at 2:17 PM, Prakash Itnal prakash...@gmail.com wrote:
 Hi,

 Currently the issue is easily reproducible. Steps to reproduce:
 * Set some aggressive values for auto-vacuuming.
 * Run a heavy database update/delete/insert queries. This leads to invoking
 auto-vacuuming in quick successions.
 * Change the system time to older for eg. 1995-01-01

 Suddenly auto-vacuuming stops working. Even after changing system time back
 to current time, the auto-vacuuming did not resume.

 So the question is, does postrges supports system time changes?.

PostgreSQL uses the system time to check whether it reached the
specific nap time to trigger the autovacuum worker.

Is it possible for you to check what autovauum launcher is doing even
after the system time is reset to current time?

I can think of a case where the launcher_determine_sleep function
returns a big sleep value because of system time change.
Because of that it is possible that the launcher is not generating
workers to do the vacuum. May be I am wrong.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] does tuple store subtransaction id in it?

2015-06-16 Thread Haribabu Kommi

On Tue, Jun 16, 2015 at 1:08 PM, Xiaoyulei xiaoyu...@huawei.com wrote:
 In XidInMVCCSnapshot, it will check xid from tuple if is in snapshot-subxip. 
 It means tuple store subtransaction?

Tuple stores only the transaction id related to the operation. This
can be either main transaction id or sub transaction id.

 But in PushTransaction, I see TransactionState.subTransaction will assign 
 currentSubTransactionId, currentSubTransactionId will reinitialize in 
 StartTransaction. So I think tuple should not store subtransaction id.

 I am confuse about this. If subtransaction id will reinitialize every start 
 time. How to judge it is a subtransaction from xid in tuple?

StartTransaction is called only once per transaction.Further on for
sub transactions it calls only startSubTransaction.
Hope this answers your question.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible pointer dereference

2015-05-27 Thread Haribabu Kommi

On Thu, May 28, 2015 at 6:07 AM, Gaetano Mendola mend...@gmail.com wrote:
 I'm playing with a static analyzer and it's giving out some real error
 analyzing postgresql code base like the following one

 src/backend/access/transam/commit_ts.c
return *ts != 0  // line 321
 but a few line up (line 315) ts is checked for null, so either is not needed
 to check for null or *ts can lead to a null pointer dereference. Same
 happens a few line later lines 333 and 339

Thanks for providing detailed information.

The function TransactionIdGetCommitTsData is currently used only at
one place. The caller
always passes an valid pointer to this function. So there shouldn't be
a problem. But in future
if the same function is used at somewhere by passing the NULL pointer
then it leads to a crash.

By correcting the following way will solve the problem.

return ts ? (*ts != 0) : false; instead of retun *ts != 0;

Attached a patch for it.

Regards,
Hari Babu
Fujitsu Australia


commit_ts_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-05-17 Thread Haribabu Kommi

On Wed, Apr 22, 2015 at 10:48 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 parallel_seqscan_v14.patch (Attached with this mail)

This patch is not applying/working with the latest head after parallel
mode patch got committed.
can you please rebase the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-05-15 Thread Haribabu Kommi

On Fri, May 15, 2015 at 11:24 PM, Stephen Frost sfr...@snowman.net wrote:
 * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
 On Tue, May 5, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote:
  It still looks quite dubious to me.
 
  The more I test this, the more fond I grow of the idea of having this
  information available in SQL.  But I'm also growing more perplexed by
  how this the file is mapped to a table.  It just isn't a good match.
 
  For instance: What is keyword_databases?  Why is it an array?  Same for
  keyword_users.  How can I know whether a given database or user matches
  a keyword?  What is compare_method?  (Should perhaps be
  keyword_address?)  Why is compare method set to mask when a hostname
  is set?  (Column order is also a bit confusing here.)  I'd also like
  options to be jsonb instead of a text array.

 Thanks for your suggestion. I am not sure how to use jsonb here, i
 will study the same
 and provide a patch for the next version.

 Regarding next version- are you referring to 9.6 and therefore we
 should go ahead and bounce this to the next CF, or were you planning to
 post a next version of the patch today?

Yes, for 9.6 version.

 This is certainly a capability which I'd like to see, though I share
 Peter's concerns regarding the splitting up of the keywords rather than
 keeping the same structure as what's in the actual pg_hba.conf.  That
 strikes me as confusing.  It'd be neat if we were able to change
 pg_hba.conf to make more sense and then perhaps the SQL version wouldn't
 look so different but I don't think there's any way to do that.

 I discussed the patch briefing with Greg over IM, who pointed out that
 keeping things just exactly as they are in the config file would mean
 implementing, essentially, a pg_hba.conf parser in SQL.  I can
 understand that perspective, but I don't think there's really much hope
 in users being able to use this view directly without a lot of effort,
 regardless.  We need to provide a function which takes the arguments
 that our pg_hba lookup does (database, user-to-login-as, maybe system
 user for pg_ident checks, optionally an IP, etc) and then returns the
 record that matches.

Thanks for details. I will try to come up with a view and a function
by considering all the above for the next commitfest.

 Apologies for not being able to provide more feedback earlier.  I'll be
 happy to help with all of the above and review the patch.

 Independently, I'd love to see an SQL interface to pg_ident.conf too,
 where, I expect anyway, it'll be a lot simpler, though I'm not sure that
 it's very useful until we also have pg_hba.conf available through SQL.

Yes, Definitely I look into pg_ident also along with pg_hba.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-05-14 Thread Haribabu Kommi

On Tue, May 5, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 5/1/15 12:33 PM, Andres Freund wrote:
 On 2015-04-08 19:19:29 +0100, Greg Stark wrote:
 I'm not sure what the best way to handle the hand-off from patch
 contribution to reviewer/committer. If I start tweaking things then
 you send in a new version it's actually more work to resolve the
 conflicts. I think at this point it's easiest if I just take it from
 here.

 Are you intending to commit this?

 It still looks quite dubious to me.

 The more I test this, the more fond I grow of the idea of having this
 information available in SQL.  But I'm also growing more perplexed by
 how this the file is mapped to a table.  It just isn't a good match.

 For instance: What is keyword_databases?  Why is it an array?  Same for
 keyword_users.  How can I know whether a given database or user matches
 a keyword?  What is compare_method?  (Should perhaps be
 keyword_address?)  Why is compare method set to mask when a hostname
 is set?  (Column order is also a bit confusing here.)  I'd also like
 options to be jsonb instead of a text array.

Thanks for your suggestion. I am not sure how to use jsonb here, i
will study the same
and provide a patch for the next version.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-04-04 Thread Haribabu Kommi

On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 2015-03-31 14:38 GMT+02:00 Haribabu Kommi kommi.harib...@gmail.com:

 keyword_databases - The database name can be all, replication,
 sameuser, samerole and samegroup.
 keyword_roles - The role can be all and a group name prefixed with +.


 I am not very happy about name - but I have no better idea :( - maybe
 database_mask, user_mask


How about database_keywords and user_keywords as column names?



 The rest of the database and role names are treated as normal database
 and role names.

 2. Added the code changes to identify the names with quoted.


 Is it a good idea? Database's names are not quoted every else (in system
 catalog). So now, we cannot to use join to this view.

 postgres=# select (databases)[1] from pg_hba_conf ;
  databases
 ---
  omega 2

 (4 rows)

 postgres=# select datname from pg_database ;
   datname
 ---
  template1
  template0
  postgres
  omega 2
 (4 rows)

 I dislike this - we know, so the name must be quoted in file (without it,
 the file was incorrect). And if you need quotes, there is a function
 quote_ident. If we use quotes elsewhere, then it should be ok, bot not now.
 Please, remove it. More, it is not necessary, when you use a keyword
 columns.

Thanks. The special handling of quotes for pg_hba_conf is not required.
I will keep the behaviour similar to the other catalog tables.
I will remove the quote support in the next version.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-31 Thread Haribabu Kommi

On Mon, Mar 30, 2015 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 I checked this patch. I like the functionality and behave.

Thanks for the review.

Here I attached updated patch with the following changes.

1. Addition of two new keyword columns

keyword_databases - The database name can be all, replication,
sameuser, samerole and samegroup.
keyword_roles - The role can be all and a group name prefixed with +.

The rest of the database and role names are treated as normal database
and role names.

2. Added the code changes to identify the names with quoted.

3. Updated documentation changes

4. Regression test is corrected.


Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-27 Thread Haribabu Kommi

On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 3/4/15 1:34 AM, Haribabu Kommi wrote:
 On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 + foreach(line, parsed_hba_lines)

 In the above for loop it is better to add check_for_interrupts to
 avoid it looping
 if the parsed_hba_lines are more.

 Updated patch is attached with the addition of check_for_interrupts in
 the for loop.

 I tried out your latest patch.  I like that it updates even in running
 sessions when the file is reloaded.

Thanks for the review. Sorry for late reply.

 The permission checking is faulty, because unprivileged users can
 execute pg_hba_settings() directly.

corrected.

 Check the error messages against the style guide (especially
 capitalization).

corrected.

 I don't like that there is a hard-coded limit of 16 options 5 pages away
 from where it is actually used.  That could be done better.

changed to 12 instead of 16.

 I'm not sure about the name pg_hba_settings.  Why not just pg_hba or
 pg_hba_conf if you're going for a representation that is close to the
 file (as opposed to pg_settings, which is really a lot more abstract
 than any particular file).

changed to pg_hba_conf.

 I would put the line_number column first.

changed.

 I continue to think that it is a serious mistake to stick special values
 like 'all' and 'replication' into the arrays without additional
 decoration.  That makes this view useless or at least dangerous for
 editing tools or tools that want to reassemble the original file.
 Clearly at least one of those has to be a use case.  Otherwise we can
 just print out the physical lines without interpretation.

It is possible to provide more than one keyword for databases or users.
Is it fine to use the text array for keyword databases and keyword users.

 The mask field can go, because address is of type inet, which can
 represent masks itself.  (Or should it be cidr then?  Who knows.)  The
 preferred visual representation of masks in pg_hba.conf has been
 address/mask for a while now, so we should preserve that.
 Additionally, you can then use the existing inet/cidr operations to do
 things like checking whether some IP address is contained in an address
 specification.

removed.

 I can't tell from the documentation what the compare_method field is
 supposed to do.  I see it on the code, but that is not a natural
 representation of pg_hba.conf.  In fact, this just supports my earlier
 statement.  Why are special values in the address field special, but not
 in the user or database fields?

 uaImplicitReject is not a user-facing authentication method, so it
 shouldn't be shown (or showable).

removed.

 I would have expected options to be split into keys and values.

 All that code to reassemble the options from the parsed struct
 representation seems crazy to me.  Surely, we could save the strings as
 we parse them?

I didn't get this point clearly. Can you explain it a bit more.

 I can't make sense of the log message pg_hba.conf not reloaded,
 pg_hba_settings may show stale information.  If load_hba() failed, the
 information isn't stale, is it?

 In any case, printing this to the server log seems kind of pointless,
 because someone using the view is presumably doing so because they can't
 or don't want to read the server log.  The proper place might be a
 warning when the view is actually called.

Changed accordingly as if the reload fails, further selects on the
view through a warning
as view data may not be proper like below.

There was some failure in reloading pg_hba.conf file. The pg_hba.conf
settings data may contains stale information

Here I attached latest patch with the fixes other than keyword columns.
I will provide the patch with keyword columns and documentation changes later.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-03-11 Thread Haribabu Kommi

On Wed, Mar 11, 2015 at 6:31 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 3, 2015 at 7:47 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have modified the patch to introduce a Funnel node (and left child
 as PartialSeqScan node).  Apart from that, some other noticeable
 changes based on feedback include:
 a) Master backend forms and send the planned stmt to each worker,
 earlier patch use to send individual elements and form the planned
 stmt in each worker.
 b) Passed tuples directly via tuple queue instead of going via
 FE-BE protocol.
 c) Removed restriction of expressions in target list.
 d) Introduced a parallelmodeneeded flag in plannerglobal structure
 and set it for Funnel plan.

 There is still some work left like integrating with
 access-parallel-safety patch (use parallelmodeok flag to decide
 whether parallel path can be generated, Enter/Exit parallel mode is still
 done during execution of funnel node).

 I think these are minor points which can be fixed once we decide
 on the other major parts of patch.  Find modified patch attached with
 this mail.

 - Something is deeply wrong with the separation of concerns between
 nodeFunnel.c and nodePartialSeqscan.c.  nodeFunnel.c should work
 correctly with *any arbitrary plan tree* as its left child, and that
 is clearly not the case right now.  shm_getnext() can't just do
 heap_getnext().  Instead, it's got to call ExecProcNode() on its left
 child and let the left child decide what to do about that.  The logic
 in InitFunnelRelation() belongs in the parallel seq scan node, not the
 funnel.  ExecReScanFunnel() cannot be calling heap_parallel_rescan();
 it needs to *not know* that there is a parallel scan under it.  The
 comment in FunnelRecheck is a copy-and-paste from elsewhere that is
 not applicable to a generic funnel mode.

In create_parallelscan_paths() function the funnel path is added once
the partial seq scan
path is generated. I feel the funnel path can be added once on top of
the total possible
parallel path in the entire query path.

Is this the right patch to add such support also?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-03-09 Thread Haribabu Kommi

On Tue, Mar 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Mar 10, 2015 at 6:50 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Tue, Mar 10, 2015 at 1:38 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
  Assuming previous patch is in right direction, I have enabled
  join support for the patch and done some minor cleanup of
  patch which leads to attached new version.

 Is this patch handles the cases where the re-scan starts without
 finishing the earlier scan?


 Do you mean to say cases like ANTI, SEMI Join (in nodeNestLoop.c)
 where we scan the next outer tuple and rescan inner table without
 completing the previous scan of inner table?

Yes.

 I have currently modelled it based on existing rescan for seqscan
 (ExecReScanSeqScan()) which means it will begin the scan again.
 Basically if the workers are already started/initialized by previous
 scan, then re-initialize them (refer function ExecReScanFunnel() in
 patch).

 Can you elaborate more if you think current handling is not sufficient
 for any case?

From ExecReScanFunnel function it seems that the re-scan waits till
all the workers
has to be finished to start again the next scan. Are the workers will
stop the current
ongoing task? otherwise this may decrease the performance instead of
improving as i feel.

I am not sure if it already handled or not,  when a worker is waiting
to pass the results,
whereas the backend is trying to start the re-scan?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-03-09 Thread Haribabu Kommi

On Tue, Mar 10, 2015 at 1:38 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 Assuming previous patch is in right direction, I have enabled
 join support for the patch and done some minor cleanup of
 patch which leads to attached new version.

Is this patch handles the cases where the re-scan starts without
finishing the earlier scan?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi

On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 + foreach(line, parsed_hba_lines)

 In the above for loop it is better to add check_for_interrupts to
 avoid it looping
 if the parsed_hba_lines are more.

Updated patch is attached with the addition of check_for_interrupts in
the for loop.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi

On Wed, Mar 4, 2015 at 12:18 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 Out of curiosity, regarding the result materialize code addition, Any
 way the caller of hba_settings function
 ExecMakeTableFunctionResult also stores the results in tuple_store.
 Is there any advantage
 doing it in hba_settings function rather than in the caller?

 That might protect against concurrent pg_hba reloads, though I suspect
 there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could
 maybe put one in this loop and check if the parser memory context has
 been reset.

I feel there is no problem of current pg_hba reloads, because the
check_for_interrupts
doesn't reload the conf files. It will be done in the postgresMain
function once the
query finishes. Am I missing something?

+ foreach(line, parsed_hba_lines)

In the above for loop it is better to add check_for_interrupts to
avoid it looping
if the parsed_hba_lines are more.

 But the immediate problem is that having the API that looked up a line
 by line number and then calling it repeatedly with successive line
 numbers was O(n^2). Each time it was called it had to walk down the
 list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or
 O(n^2). This isn't performance critical but it would not have run in a
 reasonable length of time for large pg_hba files.

 And having to store the state between calls means the code is at least
 as simple for the tuplestore, imho even simpler.

Got it. Thanks.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-03 Thread Haribabu Kommi

On Wed, Mar 4, 2015 at 5:57 AM, Greg Stark st...@mit.edu wrote:
 On further review I've made a few more changes attached.

 I think we should change the column names to users and databases
 to be clear they're lists and also to avoid the user SQL reserved
 word.

 I removed the dependency on strlist_to_array which is in
 objectaddress.c which isn't a very sensible dependency -- it does seem
 like it would be handy to have a list-based version of construct_array
 moved to arrayfuncs.c but for now it's not much more work to handle
 these ourselves.

 I changed the options to accumulate one big array instead of an array
 of bunches of options. Previously you could only end up with a
 singleton array with a comma-delimited string or a two element array
 with one of those and map=.

The changes are fine, this eliminates the unnecessary dependency on
objectaddress.

 I think the error if pg_hba fails to reload needs to be LOG. It would
 be too unexpected to the user who isn't necessarily the one who issued
 the SIGHUP to spontaneously get a warning.

 I also removed the mask from local entries and made some of the
 NULLS that shouldn't be possible to happen (unknown auth method or
 connection method) actually throw errors.

changes are fine. All the tests are passed.

Out of curiosity, regarding the result materialize code addition, Any
way the caller of hba_settings function
ExecMakeTableFunctionResult also stores the results in tuple_store.
Is there any advantage
doing it in hba_settings function rather than in the caller?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-01 Thread Haribabu Kommi

On Sat, Feb 28, 2015 at 11:41 AM, Stephen Frost sfr...@snowman.net wrote:
 Pavel,

 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 2015-02-27 22:26 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:
  Stephen Frost sfr...@snowman.net writes:
   Right, we also need a view (or function, or both) which provides what
   the *active* configuration of the running postmaster is.  This is
   exactly what I was proposing (or what I was intending to, at least) with
   pg_hba_active, so, again, I think we're in agreement here.
 
  I think that's going to be a lot harder than you realize, and it will have
  undesirable security implications, in that whatever you do to expose the
  postmaster's internal state to backends will also make it visible to other
  onlookers; not to mention probably adding new failure modes.

 we can do copy of pg_hba.conf somewhere when postmaster starts or when it
 is reloaded.

 Please see my reply to Tom.  There's no trivial way to reach into the
 postmaster from a backend- but we do get a copy of whatever the
 postmaster had when we forked, and the postmaster only reloads
 pg_hba.conf on a sighup and that sighup is passed down to the children,
 so we simply need to also reload the pg_hba.conf in the children when
 they get a sighup.

 That's how postgresql.conf is handled, which is what pg_settings is
 based off of, and I believe is the behavior folks are really looking
 for.

Loading pg_hba.conf during SIGHUP in the backends will solve the
problem of displaying the
data which is not yet loaded. This change may produce a warning if it
fails to load pg_hba.conf in the backends.

Here I attached the updated patch. I didn't yet added the pg_hba_check function.
I feel it is better to be a separate patch. I can share the same later.


Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-02-26 Thread Haribabu Kommi

On Sat, Feb 7, 2015 at 8:26 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 I am sending a review of this patch.

Thanks for the review. sorry for the delay.

 4. Regress tests

 test rules... FAILED  -- missing info about new  view

Thanks. Corrected.

 My objections:

 1. data type for database field should be array of name or array of text.

 When name contains a comma, then this comma is not escaped

 currently: {omega,my stupid extremly, name2,my stupid name}

 expected: {my stupid name,omega,my stupid extremly, name2}

 Same issue I see in options field

Changed including the user column also.

 2. Reload is not enough for content refresh - logout is necessary

 I understand, why it is, but it is not very friendly, and can be very
 stressful. It should to work with last reloaded data.

At present the view data is populated from the variable parsed_hba_lines
which contains the already parsed lines of pg_hba.conf file.

During the reload, the hba entries are reloaded only in the postmaster process
and not in every backend, because of this reason the reloaded data is
not visible until
the session is disconnected and connected.

Instead of changing the SIGHUP behavior in the backend to load the hba entries
also, whenever the call is made to the view, the pg_hba.conf file is
parsed to show
the latest reloaded data in the view. This way it doesn't impact the
existing behavior
but it may impact the performance because of file load into memory every time.

Updated patch is attached. comments?

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about durability and postgresql.

2015-02-20 Thread Haribabu Kommi

On Fri, Feb 20, 2015 at 5:09 PM, Alfred Perlstein bri...@mu.org wrote:
 Hello,

 We have a combination of 9.3 and 9.4 databases used for logging of data.

 We do not need a strong durability guarantee, meaning it is ok if on crash a
 minute or two of data is lost from our logs.  (This is just stats for our
 internal tool).

 I am looking at this page:
 http://www.postgresql.org/docs/9.4/static/non-durability.html

 And it's not clear which setting I should turn on.

 What we do NOT want is to lose the entire table or corrupt the database.  We
 do want to gain speed though by not making DATA writes durable.

 Which setting is appropriate for this use case?

 At a glance it looks like a combination of
 1) Turn off synchronous_commit
 and possibly:
 2)  Increase checkpoint_segments and checkpoint_timeout ; this reduces the
 frequency of checkpoints, but increases the storage requirements of
 /pg_xlog.

I feel changing above two configuration points are enough for your requirement.

 3) Turn off full_page_writes; there is no need to guard against partial page
 writes.

Turning off this may lead to a corrupted database in case if the
system crash during the
page write until unless your file system supports guard against
partial page writes.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-27 Thread Haribabu Kommi

On Wed, Jan 28, 2015 at 9:47 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 1/27/15 1:04 AM, Haribabu Kommi wrote:

 Here I attached the latest version of the patch.
 I will add this patch to the next commitfest.


 Apologies if this was covered, but why isn't the IP address an inet instead
 of text?

Corrected the address type as inet instead of text. updated patch is attached.

 Also, what happens if someone reloads the config in the middle of running
 the SRF?

hba entries are reloaded only in postmaster process, not in every backend.
So there shouldn't be any problem with config file reload. Am i
missing something?

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-26 Thread Haribabu Kommi

On Mon, Jun 30, 2014 at 5:06 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 I think having two columns would work. The columns could be called
 database and database_list and user and user_list respectively.

 The database column may contain one of all, sameuser, samegroup,
 replication, but if it's empty, database_list will contain an array of
 database names. Then (all, {}) and (, {all}) are easily separated.
 Likewise for user and user_list.

Thanks for the review.

I corrected all the review comments except the one to add two columns
as (database, database_list and user, user_list). I feel this may cause
some confusion to the users.

Here I attached the latest version of the patch.
I will add this patch to the next commitfest.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-10 Thread Haribabu Kommi

On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-10 10:57:16 -0500, Robert Haas wrote:
 Does parallelism help at all?

 I'm pretty damn sure. We can't even make a mildly powerfull storage
 fully busy right now. Heck, I can't make my workstation's storage with a
 raid 10 out of four spinning disks fully busy.

 I think some of that benefit also could be reaped by being better at
 hinting the OS...

Yes, it definitely helps but not only limited to IO bound operations.
It gives a good gain for the queries having CPU intensive where conditions.

One more point we may need to consider, is there any overhead in passing
the data row from workers to backend? I feel this may play a major role
when the selectivity is more.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-10 Thread Haribabu Kommi

On Tue, Nov 11, 2014 at 2:35 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 5:30 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-11-10 10:57:16 -0500, Robert Haas wrote:
  Does parallelism help at all?
 
  I'm pretty damn sure. We can't even make a mildly powerfull storage
  fully busy right now. Heck, I can't make my workstation's storage with a
  raid 10 out of four spinning disks fully busy.
 
  I think some of that benefit also could be reaped by being better at
  hinting the OS...

 Yes, it definitely helps but not only limited to IO bound operations.
 It gives a good gain for the queries having CPU intensive where
 conditions.

 One more point we may need to consider, is there any overhead in passing
 the data row from workers to backend?

 I am not sure if that overhead will be too much visible if we improve the
 use of I/O subsystem by making parallel tasks working on it.

I feel there may be an overhead because of workers needs to put the result
data in the shared memory and the backend has to read from there to process
it further. If the cost of transfering data from worker to backend is more than
fetching a tuple from the scan, then the overhead is visible when the
selectivity is more.

 However
 another idea here could be that instead of passing tuple data, we just
 pass tuple id, but in that case we have to retain the pin on the buffer
 that contains tuple untill master backend reads from it that might have
 it's own kind of problems.

Transfering tuple id doesn't solve the scenarios if the node needs any
projection.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index scan optimization

2014-10-30 Thread Haribabu Kommi

On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 26 October 2014 10:42, Haribabu Kommi wrote:
 I have a question regarding setting of key flags matched. Only the
 first key was set as matched even if we have multiple index conditions.
 Is there any reason behind that?

 Actually the keysCount can be more than one only if the key strategy is 
 BTEqualStrategyNumber (i.e. = condition)
 But our optimization is only for the '' or '=' condition only. So adding 
 code to set flag for all keys is of no use.

 Let me know if I am missing anything here?

Thanks. I understood the point.

 If any volatile function is present in the index condition, the index
 scan itself is not choosen, Is there any need of handling the same
 similar to NULLS?

 Do you mean to say that whether we should add any special handling for 
 volatile function?
 If yes then as you said since index scan itself is not selected for such 
 functions, then
 I feel We don’t need to add anything for that.

I also have the same opinion. I marked the patch as ready for committer.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi

Hi Hackers,

I am thinking of a possible problem with shm_mq structure spin lock.
This is used for protecting the shm_mq structure.

During the processing of any code under the spin lock, if the process
receives SIGQUIT signal then it is leading to a dead lock situation.

SIGQUIT-proc_exit-shm_mq_detach-try to acquire spin lock. The spin
lock is already took by the process.

It is very dificult to reproduce the problem as because the code under
the lock is very minimal.
Please let me know if I missed anything.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi

On Sun, Oct 26, 2014 at 10:17 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2014-10-26 08:52:42 +1100, Haribabu Kommi wrote:
 I am thinking of a possible problem with shm_mq structure spin lock.
 This is used for protecting the shm_mq structure.

 During the processing of any code under the spin lock, if the process
 receives SIGQUIT signal then it is leading to a dead lock situation.

 SIGQUIT-proc_exit-shm_mq_detach-try to acquire spin lock. The spin
 lock is already took by the process.

 It is very dificult to reproduce the problem as because the code under
 the lock is very minimal.
 Please let me know if I missed anything.

 I think you missed the following bit in postgres.c:

 /*
  * quickdie() occurs when signalled SIGQUIT by the postmaster.
  *
  * Some backend has bought the farm,
  * so we need to stop what we're doing and exit.
  */
 void
 quickdie(SIGNAL_ARGS)
 {
 ...
 /*
  * We DO NOT want to run proc_exit() callbacks -- we're here because
  * shared memory may be corrupted, so we don't want to try to clean 
 up our
  * transaction.  Just nail the windows shut and get out of town.  Now 
 that
  * there's an atexit callback to prevent third-party code from 
 breaking
  * things by calling exit() directly, we have to reset the callbacks
  * explicitly to make this work as intended.
  */
 on_exit_reset();

Thanks for the details. I am sorry It is not proc_exit. It is the exit
callback functions
that can cause problem.

The following is the callstack where the problem can happen, if the signal
handler is called after the spin lock took by the worker.

Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
(gdb) bt
#0  0x0072dd83 in shm_mq_detach ()
#1  0x0072e7db in shm_mq_detach_callback ()
#2  0x00726d71 in dsm_detach ()
#3  0x00726c43 in dsm_backend_shutdown ()
#4  0x00727450 in shmem_exit ()
#5  0x007272fc in proc_exit_prepare ()
#6  0x00727501 in atexit_callback ()
#7  0x0030ff435da2 in exit () from /lib64/libc.so.6
#8  0x006ddaec in bgworker_quickdie ()
#9  signal handler called
#10 0x0072ce9a in shm_mq_sendv ()


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi

On Sun, Oct 26, 2014 at 12:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Haribabu Kommi kommi.harib...@gmail.com writes:
 Thanks for the details. I am sorry It is not proc_exit. It is the exit
 callback functions that can cause problem.

 The following is the callstack where the problem can happen, if the signal
 handler is called after the spin lock took by the worker.

 Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
 (gdb) bt
 #0  0x0072dd83 in shm_mq_detach ()
 #1  0x0072e7db in shm_mq_detach_callback ()
 #2  0x00726d71 in dsm_detach ()
 #3  0x00726c43 in dsm_backend_shutdown ()
 #4  0x00727450 in shmem_exit ()
 #5  0x007272fc in proc_exit_prepare ()
 #6  0x00727501 in atexit_callback ()
 #7  0x0030ff435da2 in exit () from /lib64/libc.so.6
 #8  0x006ddaec in bgworker_quickdie ()

 Or in other words, Robert broke it.  This control path should absolutely
 not occur: the entire point of the on_exit_reset call in quickdie() is to
 prevent any callbacks from being executed when we get to shmem_exit().
 DSM-related functions DO NOT get an exemption.

The reset_on_dsm_detach function is called to remove the DSM related
callbacks.
It's my mistake, I am really sorry, the code I am using is a wrong
one. Sorry for the noise.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index scan optimization

2014-10-25 Thread Haribabu Kommi

On Tue, Sep 23, 2014 at 10:38 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 22 September 2014 19:17, Heikki Linnakangas wrote:

 On 09/22/2014 04:45 PM, Tom Lane wrote:
  Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 09/22/2014 07:47 AM, Rajeev rastogi wrote:
  So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
 
  The same happens in a single-column case. If you have a query like
  SELECT * FROM tbl2 where id2  'a', once you've found the start
  position of the scan, you know that all the rows that follow match
 too.
 
  ... unless you're doing a backwards scan.

 Sure. And you have to still check for NULLs. Have to get the details
 right..

 I have finished implementation of the discussed optimization.
 I got a performance improvement of around 30% on the schema and data shared 
 in earlier mail.

 I also tested for the index scan case, where our optimization is not done and 
 observed that there
 is no effect on those query because of this change.

 Change details:
 I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey 
 structure), the value used for this
 0x0004, which was unused.
 Inside the function _bt_first, once we finish finding the start scan position 
 based on the first key,
 I am appending the flag SK_BT_MATCHED to the first key.
 Then in the function _bt_checkkeys, during the key comparison, I am checking 
 if the key has SK_BT_MATCHED flag set, if yes then
 there is no need to further comparison. But if the tuple is having NULL 
 value, then even if this flag is set, we will continue
 with further comparison (this handles the Heikki point of checking NULLs).

Hi,

I reviewed index scan optimization patch, the following are the observations.

- Patch applies cleanly.
- Compiles without warnings
- All regress tests are passed.

There is a good performance gain with the patch in almost all scenarios.

I have a question regarding setting of key flags matched. Only the
first key was set as matched
even if we have multiple index conditions. Is there any reason behind that?

If any volatile function is present in the index condition, the index
scan itself is not choosen,
Is there any need of handling the same similar to NULLS?

Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-09-01 Thread Haribabu Kommi

On Fri, Aug 29, 2014 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Haribabu Kommi kommi.harib...@gmail.com writes:
 Thanks for your review. Please find the rebased patch to latest HEAD.

 Committed with minor (mostly cosmetic) alterations.

Thanks.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-08-26 Thread Haribabu Kommi

On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:

 So my proposal is a bit more complicated.  First we introduce the notion
 of a single number, to enable sorting and computations: the delay
 equivalent, which is the cost_limit divided by cost_delay.

 Here's a patch that implements this idea.  As you see this is quite a
 bit more complicated that Haribabu's proposal.

 There are two holes in this:

 1. if you ALTER DATABASE to change vacuum delay for a database, those
 values are not considered in the global equiv delay.  I don't think this
 is very important and anyway we haven't considered this very much, so
 it's okay if we don't handle it.

 2. If you have a fast worker that's only slightly faster than regular
 workers, it will become slower in some cases.  This is explained in a
 FIXME comment in the patch.

 I don't really have any more time to invest in this, but I would like to
 see it in 9.4.  Mark, would you test this?  Haribabu, how open are you
 to fixing point (2) above?

Thanks Alvaro. I will check the point(2).

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Sequence Scan doubts

2014-08-24 Thread Haribabu Kommi

On Sun, Aug 24, 2014 at 12:34 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 08/24/2014 09:40 AM, Haribabu Kommi wrote:

 Any suggestions?

 Another point I didn't raise first time around, but that's IMO quite
 significant, is that you haven't addressed why this approach to fully
 parallel seqscans is useful and solves real problems in effective ways.

 It might seem obvious - of course they're useful. But I see two things
 they'd address:

 - CPU-limited sequential scans, where expensive predicates are filtering
 the scan; and

Yes, we are mainly targeting CPU-limited sequential scans, Because of
this reason
only I want the worker to handle the predicates also not just reading
the tuples from
disk.

 - I/O limited sequential scans, where the predicates already execute
 fast enough on one CPU, so most time is spent waiting for more disk I/O.

 The problem I see with your design is that it's not going to be useful
 for a large class of CPU-limited scans where the predicate isn't
 composed entirely of immutable functions an operators. Especially since
 immutable-only predicates are the best candidates for expression indexes
 anyway.

 While it'd likely be useful for I/O limited scans, it's going to
 increase contention on shared_buffers locking and page management. More
 importantly, is it the most efficient way to solve the problem with I/O
 limited scans?

 I would seriously suggest looking at first adding support for
 asynchronous I/O across ranges of extents during a sequential scan. You
 might not need multiple worker backends at all.

 I'm sure using async I/O to implement effective_io_concurrency for
 seqscans has been been discussed and explored before, so again I think
 some time in the list archives might make sense.

Thanks for your inputs. I will check it.

 I don't know if it makes sense to do something as complex and parallel
 multi-process seqscans without having a path forward for supporting
 non-immutable functions - probably with fmgr API enhancements,
 additional function labels (PARALLEL), etc, depending on what you find
 is needed.

Thanks for your inputs.
I will check for a solution to extend the support for non-immutable
functions also.

 3. In the executor Init phase, Try to copy the necessary data required
 by the workers and start the workers.

 Copy how?

 Back-ends can only communicate with each other over shared memory,
 signals, and using sockets.

 Sorry for not being clear, copying those data structures into dynamic
 shared memory only.
 From there the workers can access.

 That'll probably work with read-only data, but it's not viable for
 read/write data unless you use a big lock to protect it, in which case
 you lose the parallelism you want to achieve.

As of now I am thinking of sharing read-only data with workers.
In case if read/write data needs to be shared, then  we may need
another approach to handle the same.

 You'd have to classify what may be modified during scan execution
 carefully and determine if you need to feed any of the resulting
 modifications back to the original backend - and how to merge
 modifications by multiple workers, if it's even possible.

 That's going to involve a detailed structure-by-structure analysis and
 seems likely to be error prone and buggy.

Thanks for your inputs. I will check it properly.

 I think you should probably talk to Robert Haas about what he's been
 doing over the last couple of years on parallel query.

Sure, I will check with him.

 4. In the executor run phase, just get the tuples which are sent by
 the workers and process them further in the plan node execution.

 Again, how do you propose to copy these back to the main bgworker?

 With the help of message queues that are created in the dynamic shared 
 memory,
 the workers can send the data to the queue. On other side the main
 backend receives the tuples from the queue.

 OK, so you plan to implement shmem queues.

 That'd be a useful starting point, as it'd be something that would be
 useful in its own right.

shmem queues are already possible with dynamic shared memory.
Just I want to use them here.

 You'd have to be able to handle individual values that're than the ring
 buffer or whatever you're using for transfers, in case you're dealing
 with already-detoasted tuples or in-memory tuples.

 Again, chatting with Robert and others who've worked on dynamic shmem,
 parallel query, etc would be wise here.

 Yes you are correct. For that reason only I am thinking of Supporting
 of functions
 that only dependent on input variables and are not modifying any global data.

 You'll want to be careful with that. Nothing stops an immutable function
 referencing a cache in a C global that it initializes one and then
 treats as read only, for example.

 I suspect you'll need a per-function whitelist. I'd love to be wrong.

Yes, we need per-function level details. Once we have a better
solution to handle
non-immutable functions also then these may not be required

Re: [HACKERS] Parallel Sequence Scan doubts

2014-08-23 Thread Haribabu Kommi

On Sun, Aug 24, 2014 at 1:11 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 08/21/2014 02:47 PM, Haribabu Kommi wrote:
 Implementation of Parallel Sequence Scan

 I think you mean Parallel Sequential Scan.

Sorry for not being clear, Yes it is parallel sequential scan.


 1.Parallel Sequence Scan can achieved by using the background
 workers doing the job of actual sequence scan including the
 qualification check also.

 Only if the qualifiers are stable/immutable I think.

 Not even necessarily stable functions - consider use of the
 fmgr/executor state contexts to carry information over between calls,
 and what parallel execution would do to that.

Thanks for your input. As of now we are targeting only immutable functions,
that can be executed in parallel without any side effects.

 3. In the executor Init phase, Try to copy the necessary data required
 by the workers and start the workers.

 Copy how?

 Back-ends can only communicate with each other over shared memory,
 signals, and using sockets.

Sorry for not being clear, copying those data structures into dynamic
shared memory only.
From there the workers can access.

 4. In the executor run phase, just get the tuples which are sent by
 the workers and process them further in the plan node execution.

 Again, how do you propose to copy these back to the main bgworker?

With the help of message queues that are created in the dynamic shared memory,
the workers can send the data to the queue. On other side the main
backend receives
the tuples from the queue.


 1. Data structures that are required to be copied from backend to
 worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate
 and etc.

 That's a big etc. Huge, in fact.

 Any function can reference any global variable. Even an immutable
 function might use globals for cache etc - and might, for example, set
 them up using an executor start hook. You cannot really make any
 assumptions about what functions access what memory.

Yes you are correct. For that reason only I am thinking of Supporting
of functions
that only dependent on input variables and are not modifying any global data.

 I see some problems in copying Estate data structure into the shared
 memory because it contains so many pointers. There is a need of some
 infrastructure to copy these data structures into the shared memory.

 It's not just a matter of copying them into/via shmem.

 It's about their meaning. Does it even make sense to copy the executor
 state to another backend? If so, you can't copy it back, so what do you
 do at the end of the scans?

If we handle the locking of relation in the backend and avoid doing
the parallel sequential scan
if any sub query is involved, then there is no need of full estate in
the worker.
In those cases by sharing less information, I think we can execute the
plan in the worker.

 Any suggestions?

 Before you try to design anything more on this, study the *large* amount
 of discussion that has happened on this topic on this mailing list over
 the last years.

 This is not a simple or easy task, and it's not one you should approach
 without studying what's already been worked on, built, contemplated, etc.

Thanks for your information.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-08-23 Thread Haribabu Kommi

On Mon, Aug 18, 2014 at 8:12 PM, Asif Naeem anaeem...@gmail.com wrote:
 Thank you for sharing updated patch. With latest 9.5 source code, patch
 build is failing with following error message i.e.

 /Applications/Xcode.app/Contents/Developer/usr/bin/make -C catalog
 schemapg.h
 cd ../../../src/include/catalog  '/opt/local/bin/perl' ./duplicate_oids
 3255
 make[3]: *** [postgres.bki] Error 1
 make[2]: *** [submake-schemapg] Error 2
 make[1]: *** [all-backend-recurse] Error 2
 make: *** [all-src-recurse] Error 2


 New function is being added by following commit i.e.

 commit b34e37bfefbed1bf9396dde18f308d8b96fd176c
 Author: Robert Haas rh...@postgresql.org
 Date:   Thu Aug 14 12:09:52 2014 -0400
 Add sortsupport routines for text.
 This provides a small but worthwhile speedup when sorting text, at
 least
 in cases to which the sortsupport machinery applies.
 Robert Haas and Peter Geoghegan


 It seems that you need to use another oid. Other than this patch looks good
 to me, please share updated patch and feel free to assign it to committer.
 Thanks.

Thanks for your review. Please find the rebased patch to latest HEAD.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Implementing parallel sequence doubts

2014-08-21 Thread Haribabu Kommi

Hi Hackers,

Implementation of Parallel Sequence Scan

Approach:

1.Parallel Sequence Scan can achieved by using the background
workers doing the job of actual sequence scan including the
qualification check also.

2. Planner generates the parallel scan plan by checking the possible
criteria of the table to do a parallel scan and generates the tasks
(range of blocks).

3. In the executor Init phase, Try to copy the necessary data required
by the workers and start the workers.

4. In the executor run phase, just get the tuples which are sent by
the workers and process them further in the plan node execution.

some of the problems i am thinking:

1. Data structures that are required to be copied from backend to
worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate
and etc.

I see some problems in copying Estate data structure into the shared
memory because it contains so many pointers. There is a need of some
infrastructure to copy these data structures into the shared memory.

If the backend takes care of locking the relation and there are no sub
plans involved in the qual and targetlist, is the estate is really
required in the worker to achieve the parallel scan?

Is it possible to Initialize the qual, targetlist and projinfo with
the local Estate structure created in the worker with the help of
plan structure received from the backend?

Or

In case if estate is required in the worker for the processing, is
there any better way possible to share backend data structures with
the workers?

Any suggestions?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Sequence Scan doubts

2014-08-21 Thread Haribabu Kommi

Corrected subject.

Hi Hackers,

Implementation of Parallel Sequence Scan

Approach:

1.Parallel Sequence Scan can achieved by using the background
workers doing the job of actual sequence scan including the
qualification check also.

2. Planner generates the parallel scan plan by checking the possible
criteria of the table to do a parallel scan and generates the tasks
(range of blocks).

3. In the executor Init phase, Try to copy the necessary data required
by the workers and start the workers.

4. In the executor run phase, just get the tuples which are sent by
the workers and process them further in the plan node execution.

some of the problems i am thinking:

1. Data structures that are required to be copied from backend to
worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate
and etc.

I see some problems in copying Estate data structure into the shared
memory because it contains so many pointers. There is a need of some
infrastructure to copy these data structures into the shared memory.

If the backend takes care of locking the relation and there are no sub
plans involved in the qual and targetlist, is the estate is really
required in the worker to achieve the parallel scan?

Is it possible to Initialize the qual, targetlist and projinfo with
the local Estate structure created in the worker with the help of
plan structure received from the backend?

Or

In case if estate is required in the worker for the processing, is
there any better way possible to share backend data structures with
the workers?

Any suggestions?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-08-12 Thread Haribabu Kommi

On Mon, Aug 4, 2014 at 3:22 PM, Asif Naeem anaeem...@gmail.com wrote:
 Sorry for not being clear, above mentioned test is related to following doc 
 (sgml) changes that seems not working as described i.e.

 Table 9-35. cidr and inet Functions

 FunctionReturn TypeDescriptionExampleResult

 max(inet, inet) inetlarger of two inet typesmax('192.168.1.5', 
 '192.168.1.4')192.168.1.5
 min(inet, inet) inetsmaller of two inet typesmin('192.168.1.5', 
 '192.168.1.4')192.168.1.4

 Can you please elaborate these sgml change ?

I thought of adding them for newly added network functions but
mistakenly I kept the names as max and min.
Anyway with your suggestion in the earlier mail, these changes are not required.

I removed these changes in the attached patch.
Thanks for reviewing the patch.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-07-27 Thread Haribabu Kommi

On Thu, Jul 24, 2014 at 5:59 PM, Asif Naeem anaeem...@gmail.com wrote:
 Sorry for being late. Thank you for sharing updated patch, sgml changes
 seems not working i.e.

 postgres=# select max('192.168.1.5', '192.168.1.4');
 ERROR:  function max(unknown, unknown) does not exist
 LINE 1: select max('192.168.1.5', '192.168.1.4');

^
 HINT:  No function matches the given name and argument types. You might
 need to add explicit type casts.
 postgres=# select min('192.168.1.5', '192.168.1.4');
 ERROR:  function min(unknown, unknown) does not exist
 LINE 1: select min('192.168.1.5', '192.168.1.4');

^
 HINT:  No function matches the given name and argument types. You might
 need to add explicit type casts.

I didn't get your point. I tested the patch, the sgml changes are working fine.

 I would suggest the following or similar changes e.g.

 doc/src/sgml/func.sgml
 /indexterm
 functionmax(replaceable
 class=parameterexpression/replaceable)/function
/entry
 -  entryany array, numeric, string, or date/time type/entry
 +  entryany inet, array, numeric, string, or date/time type/entry
entrysame as argument type/entry
entry
 maximum value of replaceable
 @@ -12204,7 +12228,7 @@ NULL baz/literallayout(3 rows)/entry
 /indexterm
 functionmin(replaceable
 class=parameterexpression/replaceable)/function
/entry
 -  entryany array, numeric, string, or date/time type/entry
 +  entryany inet, array, numeric, string, or date/time type/entry
entrysame as argument type/entry
entry
 minimum value of replaceable

Thanks for reviewing the patch.
I missed the above change. Corrected the same in the attached patch.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-07-09 Thread Haribabu Kommi

On Mon, Jul 7, 2014 at 6:59 PM, Asif Naeem anaeem...@gmail.com wrote:
 Hi Haribabu,

 Thank you for sharing the patch. I have spent some time to review the
 changes. Overall patch looks good to me, make check and manual testing seems
 run fine with it. There seems no related doc/sgml changes ?. Patch added
 network_smaller() and network_greater() functions but in PG source code,
 general practice seems to be to use “smaller and “larger” as related
 function name postfix e.g. timestamp_smaller()/timestamp_larger(),
 interval_smaller/interval_larger(), cashsmaller()/cashlarger() etc. Thanks.

Thanks for reviewing the patch.

I corrected the function names as smaller and larger.
and also added documentation changes.

Updated patch attached in the mail.

Regards,
Hari Babu
Fujitsu Australia
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 8647,8652  CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
--- 8647,8676 
 row
  entry
   indexterm
+   primarymax/primary
+  /indexterm
+  literalfunctionmax(typeinet/type, typeinet/type)/function/literal
+ /entry
+ entrytypeinet/type/entry
+ entrylarger of two inet types/entry
+ entryliteralmax('192.168.1.5', '192.168.1.4')/literal/entry
+ entryliteral192.168.1.5/literal/entry
+/row
+row
+ entry
+  indexterm
+   primarymin/primary
+  /indexterm
+  literalfunctionmin(typeinet/type, typeinet/type)/function/literal
+ /entry
+ entrytypeinet/type/entry
+ entrysmaller of two inet types/entry
+ entryliteralmin('192.168.1.5', '192.168.1.4')/literal/entry
+ entryliteral192.168.1.4/literal/entry
+/row
+row
+ entry
+  indexterm
primarynetmask/primary
   /indexterm
   literalfunctionnetmask(typeinet/type)/function/literal
*** a/src/backend/utils/adt/network.c
--- b/src/backend/utils/adt/network.c
***
*** 471,476  network_ne(PG_FUNCTION_ARGS)
--- 471,499 
  	PG_RETURN_BOOL(network_cmp_internal(a1, a2) != 0);
  }
  
+ Datum
+ network_smaller(PG_FUNCTION_ARGS)
+ {
+ 	inet	   *a1 = PG_GETARG_INET_PP(0);
+ 	inet	   *a2 = PG_GETARG_INET_PP(1);
+ 
+ 	if (network_cmp_internal(a1, a2)  0)
+ 		PG_RETURN_INET_P(a1);
+ 	else
+ 		PG_RETURN_INET_P(a2);
+ }
+ 
+ Datum
+ network_larger(PG_FUNCTION_ARGS)
+ {
+ 	inet	   *a1 = PG_GETARG_INET_PP(0);
+ 	inet	   *a2 = PG_GETARG_INET_PP(1);
+ 
+ 	if (network_cmp_internal(a1, a2)  0)
+ 		PG_RETURN_INET_P(a1);
+ 	else
+ 		PG_RETURN_INET_P(a2);
+ }
  /*
   * Support function for hash indexes on inet/cidr.
   */
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
***
*** 164,169  DATA(insert ( 2050	n 0 array_larger	----f f 1073	2277	0	0		0	_nu
--- 164,170 
  DATA(insert ( 2244	n 0 bpchar_larger	----f f 1060	1042	0	0		0	_null_ _null_ ));
  DATA(insert ( 2797	n 0 tidlarger		----f f 2800	27		0	0		0	_null_ _null_ ));
  DATA(insert ( 3526	n 0 enum_larger		----f f 3519	3500	0	0		0	_null_ _null_ ));
+ DATA(insert ( 3255	n 0 network_larger	----f f 1205	869		0	0		0	_null_ _null_ ));
  
  /* min */
  DATA(insert ( 2131	n 0 int8smaller		----f f 412		20		0	0		0	_null_ _null_ ));
***
*** 186,191  DATA(insert ( 2051	n 0 array_smaller	----f f 1072	2277	0	0		0	_n
--- 187,193 
  DATA(insert ( 2245	n 0 bpchar_smaller	----f f 1058	1042	0	0		0	_null_ _null_ ));
  DATA(insert ( 2798	n 0 tidsmaller		----f f 2799	27		0	0		0	_null_ _null_ ));
  DATA(insert ( 3527	n 0 enum_smaller	----f f 3518	3500	0	0		0	_null_ _null_ ));
+ DATA(insert ( 3256	n 0 network_smaller	----f f 1203	869		0	0		0	_null_ _null_ ));
  
  /* count */
  DATA(insert ( 2147	n 0 int8inc_any		-int8inc_any		int8dec_any		-f f 0		20		0	20		0	0 0 ));
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 2120,2125  DATA(insert OID = 922 (  network_le			PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 1
--- 2120,2129 
  DATA(insert OID = 923 (  network_gt			PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_	network_gt _null_ _null_ _null_ ));
  DATA(insert OID = 924 (  network_ge			PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_	network_ge _null_ _null_ _null_ ));
  DATA(insert OID = 925 (  network_ne			PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_	network_ne _null_ _null_ _null_ ));
+ DATA(insert OID = 3257 (  network_larger	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 869 869 869 _null_ _null_ _null_ _null_	network_larger _null_ _null_ _null_ ));
+ DESCR(larger of two network types);
+ DATA(insert OID = 3258 (

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-06-20 Thread Haribabu Kommi

On Thu, May 8, 2014 at 10:00 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 Hi,

 This patch implements $subject only when ANALYZE and VERBOSE are on.
 I made it that way because for years nobody seemed interested in this
 info (at least no one did it) so i decided that maybe is to much
 information for most people (actually btree indexes are normally very
 fast).

 But because we have GiST and GIN this became an interested piece of
 data (there are other cases even when using btree).

 Current patch doesn't have docs yet i will add them soon.

This patch provides the Insertion time of an index operation.
This information is useful to the administrator for reorganizing the indexes
based on the insertion time.

Quick review:

Applies to Head.
Regress test is passed.
Coding is fine.

Minor comments:

There is no need of printing the index insertion time as zero in case
of hot update operations.
Please correct the same.

Add the documentation changes.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions

2014-06-18 Thread Haribabu Kommi

On Tue, Jun 17, 2014 at 12:30 AM, Abhijit Menon-Sen a...@2ndquadrant.com 
wrote:
 (Cc: to pgsql-bugs dropped.)

 At 2014-03-17 18:24:55 +1100, kommi.harib...@gmail.com wrote:

 *** a/doc/src/sgml/xfunc.sgml
 --- b/doc/src/sgml/xfunc.sgml
 ***
 *** 153,159  SELECT clean_emp();
 --- 153,186 
   (literal\/) (assuming escape string syntax) in the body of
   the function (see xref linkend=sql-syntax-strings).
  /para
 +
 +sect2 id=xfunc-sql-function-parsing-mechanism
 + titleParsing mechanism of a function/title

 +indexterm
 + primaryfunction/primary
 + secondaryparsing mechanism/secondary
 +/indexterm

 I suggest Catalog changes within functions instead of the above title.

 + para
 +  The body of an SQL function is parsed as if it were a single - 
 multi-part
 +  statement and thus uses a constant snapshot of the system catalogs for
 +  every sub-statement therein. Commands that alter the catalog will 
 likely not
 +  work as expected.
 + /para
 +
 + para
 +  For example: Issuing CREATE TEMP TABLE within an SQL function will 
 add
 +  the table to the catalog but subsequent statements in the same 
 function will
 +  not see those additions and thus the temporary table will be 
 invisible to them.
 + /para
 +
 + para
 +  Thus it is generally advised that applicationPL/pgSQL/ be used, 
 instead of
 +  acronymSQL/acronym, when any catalog visibilities are required in 
 the same function.
 + /para
 +/sect2

 I don't think that much text is warranted. I suggest something like the
 following condensed wording:

 para
  The body of an SQL function is parsed as if it were a single
  multi-part statement, using a constant snapshot of the system
  catalogs. The effect of any commands that alter the catalogs
  (e.g. CREATE TEMP TABLE) will therefore not be visible to
  subsequent commands in the function body.
 /para

 para
  The recommended workaround is to use applicationPL/PgSQL/.
 /para

 Does that seem sensible to you?

Looks good, Thanks for the review.
Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


sql_functions_parsing_doc_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-06-04 Thread Haribabu Kommi

On Thu, Jun 5, 2014 at 9:12 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2014-06-04 10:37:48 +1000, Haribabu Kommi wrote:
 Thanks for the test. Patch is re-based to the latest head.

 Did you look at the source of the conflict? Did you intentionally mark
 the functions as leakproof and reviewed that that truly is the case? Or
 was that caused by copy  paste?

Yes it is copy  paste mistake. I didn't know much about that parameter.
Thanks for the information. I changed it.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-06-03 Thread Haribabu Kommi

On Wed, Jun 4, 2014 at 5:46 AM, Keith Fiske ke...@omniti.com wrote:

 Andres's changes on June 3rd to
 https://github.com/postgres/postgres/commits/master/src/test/regress/expected/create_function_3.out
 are causing patch v2 to fail for that regression test file.

 postgres $ patch -p1 -i ../inet_agg_v2.patch
 patching file src/backend/utils/adt/network.c
 patching file src/include/catalog/pg_aggregate.h
 patching file src/include/catalog/pg_proc.h
 patching file src/include/utils/builtins.h
 patching file src/test/regress/expected/create_function_3.out
 Hunk #1 FAILED at 153.
 1 out of 1 hunk FAILED -- saving rejects to file
 src/test/regress/expected/create_function_3.out.rej
 patching file src/test/regress/expected/inet.out
 patching file src/test/regress/sql/inet.sql

 Otherwise it applies without any issues to the latest HEAD. I built and
 started a new instance, and I was able to test at least the basic min/max
 functionality

 keith=# create table test_inet (id serial, ipaddress inet);
 CREATE TABLE
 Time: 25.546 ms
 keith=# insert into test_inet (ipaddress) values ('192.168.1.1');
 INSERT 0 1
 Time: 3.143 ms
 keith=# insert into test_inet (ipaddress) values ('192.168.1.2');
 INSERT 0 1
 Time: 2.932 ms
 keith=# insert into test_inet (ipaddress) values ('127.0.0.1');
 INSERT 0 1
 Time: 1.786 ms
 keith=# select min(ipaddress) from test_inet;
 min
 ---
  127.0.0.1
 (1 row)

 Time: 3.371 ms
 keith=# select max(ipaddress) from test_inet;
  max
 -
  192.168.1.2
 (1 row)

 Time: 1.104 ms

Thanks for the test. Patch is re-based to the latest head.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max

2014-06-02 Thread Haribabu Kommi

On Thu, May 29, 2014 at 3:28 AM, Keith Fiske ke...@omniti.com wrote:
 On Sun, Mar 23, 2014 at 10:42 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Fri, Mar 21, 2014 at 5:17 PM,  dar...@dons.net.au wrote:
  The following bug has been logged on the website:
  reclog= select * from foo;
 bar
  -
   1.2.3.4
  (1 row)
 
  reclog= select min(bar) from foo;
  ERROR:  function min(inet) does not exist
  LINE 1: select min(bar) from foo;
 ^
  HINT:  No function matches the given name and argument types. You might
  need
  to add explicit type casts.
 
  This is surprising to me since the inet type is ordered, hence min/max
  are
  possible.

 In the code, some comparison logic for the inet datatypes already present.
 With those I thought it is easy to support the min and max aggregates
 also.
 So I modified the code to support the aggregate functions of min and
 max by using
 the already existing inet comparison logic. Is there anything I am
 missing?

 postgres=# create table tbl(f inet);
 CREATE TABLE
 postgres=# insert into tbl values('1.2.3.5');
 INSERT 0 1
 postgres=# insert into tbl values('1.2.3.4');
 INSERT 0 1
 postgres=# select min(f) from tbl;
min
 -
  1.2.3.4
 (1 row)

 postgres=# select max(f) from tbl;
max
 -
  1.2.3.5
 (1 row)

 Patch is attached.

 This is my first time reviewing a patch, so apologies if I've missed
 something in the process.

 I tried applying your patch to the current git HEAD as of 2014-05-27 and the
 portion against pg_aggregate.h fails

 postgres $ patch -Np1 -i ../inet_agg.patch
 patching file src/backend/utils/adt/network.c
 Hunk #1 succeeded at 471 (offset -49 lines).
 patching file src/include/catalog/pg_aggregate.h
 Hunk #1 FAILED at 140.
 Hunk #2 FAILED at 162.
 2 out of 2 hunks FAILED -- saving rejects to file
 src/include/catalog/pg_aggregate.h.rej
 patching file src/include/catalog/pg_proc.h
 Hunk #1 succeeded at 2120 (offset 8 lines).
 Hunk #2 succeeded at 3163 (offset 47 lines).
 Hunk #3 succeeded at 3204 (offset 47 lines).
 patching file src/include/utils/builtins.h
 Hunk #1 succeeded at 907 (offset 10 lines).

 Looks like starting around April 12th some additional changes were made to
 the DATA lines in this file that have to be accounted for.

 https://github.com/postgres/postgres/commits/master/src/include/catalog/pg_aggregate.h

 Don't have the knowledge of how to fix this for the new format, but thought
 I'd at least try to apply the patch and see if it works.

Thanks for verifying the patch. Please find the re-based patch
attached in the mail.

Regards,
Hari Babu
Fujitsu Australia


inet_agg_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2014-05-12 Thread Haribabu Kommi

On Tue, May 13, 2014 at 3:33 AM, Fujii Masao masao.fu...@gmail.com wrote:
On Sun, May 11, 2014 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
On 30 August 2013 04:55, Fujii Masao masao.fu...@gmail.com wrote:

My idea is very simple, just compress FPW because FPW is
a big part of WAL. I used pglz_compress() as a compression method,
but you might think that other method is better. We can add
something like FPW-compression-hook for that later. The patch
adds new GUC parameter, but I'm thinking to merge it to full_page_writes
parameter to avoid increasing the number of GUC. That is,
I'm thinking to change full_page_writes so that it can accept new value
'compress'.

* Result
[tps]
1386.8 (compress_backup_block = off)
1627.7 (compress_backup_block = on)

[the amount of WAL generated during running pgbench]
4302 MB (compress_backup_block = off)
1521 MB (compress_backup_block = on)

Compressing FPWs definitely makes sense for bulk actions.

I'm worried that the loss of performance occurs by greatly elongating
transaction response times immediately after a checkpoint, which were
already a problem. I'd be interested to look at the response time
curves there.

Yep, I agree that we should check how the compression of FPW affects
the response time, especially just after checkpoint starts.

I was thinking about this and about our previous thoughts about double
buffering. FPWs are made in foreground, so will always slow down
transaction rates. If we could move to double buffering we could avoid
FPWs altogether. Thoughts?

If I understand the double buffering correctly, it would eliminate the need
for
FPW. But I'm not sure how easy we can implement the double buffering.

There is already a patch on the double buffer write to eliminate the FPW.
But It has some performance problem because of CRC calculation for the
entire page.

http://www.postgresql.org/message-id/1962493974.656458.1327703514780.javamail.r...@zimbra-prod-mbox-4.vmware.com

I think this patch can be further modified with a latest multi core
CRC calculation and can be used for testing.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-05-04 Thread Haribabu Kommi

On Mon, May 5, 2014 at 1:09 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 I modified the autovac_balance_cost function to balance the costs using
 the number of running workers, instead
 of default vacuum cost parameters.

 Lets assume there are 4 workers running currently with default cost values
 of limit 200 and delay 20ms.
 The cost will be distributed as 50 and 10ms each.

 Suppose if one worker is having a different cost limit value as 1000, which
 is 5 times more than default value.
 The cost will be distributed as 50 and 10ms each for other 3 workers and 250
 and 10ms for the worker having
 cost limit value other than default. By this way also it still ensures the
 cost limit value is 5 times more than other workers.

 Won't this change break the basic idea of autovacuum_vacuum_cost_limit
 which is as follows:
 Note that the value is distributed proportionally among the running 
 autovacuum
 workers, if there is more than one, so that the sum of the limits of each 
 worker
 never exceeds the limit on this variable..

It is not breaking the behavior. This setting can be overridden for
individual tables by
changing storage parameters. Still the cost values for the default
tables are under the guc limit.

 Basically with proposed change, the sum of limits of each worker will be more
 than autovacuum_vacuum_cost_limit and I think main reason for same is that
 the new calculation doesn't consider autovacuum_vacuum_cost_limit or other
 similar parameters.

If user doesn't provide any table specific value then
autovacuum_vacuum_cost_limit guc
value is set to the table. So the same is used in the calculation.

 I think current calculation gives appropriate consideration for table level
 vacuum settings when autovacuum_vacuum_cost_limit is configured
 with more care (i.e it is more than table level settings).  As an example
 consider the below case:

 autovacuum_vacuum_cost_limit = 1
 t1 (autovacuum_vacuum_cost_limit = 1000)
 t2 (default)
 t3 (default)
 t4 (default)

 Consider other settings as Default.

 Now cost_limit after autovac_balance_cost is as follows:
 Worker-1 for t1 = 322
 Worker-2 for t2 = 3225
 Worker-3 for t3 = 3225
 Worker-4 for t3 = 3225

 So in this way proper consideration has been given to table level
 vacuum settings and guc configured for autovacuum_vacuum_cost_limit
 with current code.

It works for the case where the table specific values less than the
default cost limit.
The same logic doesn't work with higher values. Usually the table
specific values
are more than default values to the tables where the faster vacuuming
is expected.

 Now it might be the case that we want to improve current calculation for
 cases where it doesn't work well, but I think it has to be better than current
 behaviour and it is better to consider both guc's and table level settings 
 with
 some better formula.

With the proposed change, it works for both fine whether the table
specific value is higher
or lower to the default value. It works on the factor of the
difference between the default value
to the table specific value.

default autovacuum_vacuum_cost_limit = 1
t1 - 1000, t2 - default, t3 - default, t4 - default  -- balance costs
t1 - 250, t2 - 2500, t3 - 2500, t4 - 2500.

t1 - 2, t2 - default, t3 - default, t4 - default  -- balance
costs t1 - 5000, t2 - 2500, t3 - 2500, t4 - 2500.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-10 Thread Haribabu Kommi

On Fri, Apr 11, 2014 at 7:44 AM, Bruce Momjian br...@momjian.us wrote:

 Can someone with Windows expertise comment on whether this should be
 applied?

I tested the same in windows and it is working as specified.
The same background running server can be closed with ctrl+break command.

 ---

 On Tue, Jan  7, 2014 at 12:44:33PM +0100, Christian Ullrich wrote:
 Hello all,

 when pg_ctl start is used to run PostgreSQL in a console window on
 Windows, it runs in the background (it is terminated by closing the
 window, but that is probably inevitable). There is one problem,
 however: The first Ctrl-C in that window, no matter in which
 situation, will cause the background postmaster to exit. If you,
 say, ping something, and press Ctrl-C to stop ping, you probably
 don't want the database to go away, too.

 The reason is that Windows delivers the Ctrl-C event to all
 processes using that console, not just to the foreground one.

 Here's a patch to fix that. pg_ctl stop still works, and it has no
 effect when running as a service, so it should be safe. It starts
 the postmaster in a new process group (similar to calling setpgrp()
 after fork()) that does not receive Ctrl-C events from the console
 window.

 --
 Christian

 diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
 new file mode 100644
 index 50d4586..89a9544
 *** a/src/bin/pg_ctl/pg_ctl.c
 --- b/src/bin/pg_ctl/pg_ctl.c
 *** CreateRestrictedProcess(char *cmd, PROCE
 *** 1561,1566 
 --- 1561,1567 
   HANDLE  restrictedToken;
   SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
   SID_AND_ATTRIBUTES dropSids[2];
 + DWORD   flags;

   /* Functions loaded dynamically */
   __CreateRestrictedToken _CreateRestrictedToken = NULL;
 *** CreateRestrictedProcess(char *cmd, PROCE
 *** 1636,1642 
   AddUserToTokenDacl(restrictedToken);
   #endif

 ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, 
 CREATE_SUSPENDED, NULL, NULL, si, processInfo);

   Kernel32Handle = LoadLibrary(KERNEL32.DLL);
   if (Kernel32Handle != NULL)
 --- 1637,1650 
   AddUserToTokenDacl(restrictedToken);
   #endif

 ! flags = CREATE_SUSPENDED;
 !
 ! /* Protect console process from Ctrl-C */
 ! if (!as_service) {
 ! flags |= CREATE_NEW_PROCESS_GROUP;
 ! }
 !
 ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, 
 flags, NULL, NULL, si, processInfo);

   Kernel32Handle = LoadLibrary(KERNEL32.DLL);
   if (Kernel32Handle != NULL)

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-10 Thread Haribabu Kommi

On Fri, Apr 11, 2014 at 12:12 PM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Apr 11, 2014 at 11:58:58AM +1000, Haribabu Kommi wrote:
 On Fri, Apr 11, 2014 at 7:44 AM, Bruce Momjian br...@momjian.us wrote:
 
  Can someone with Windows expertise comment on whether this should be
  applied?

 I tested the same in windows and it is working as specified.
 The same background running server can be closed with ctrl+break command.

 OK.  If I apply this, are you able to test to see if the problem is
 fixed?

I already tested in HEAD by applying the attached patch in the earlier mail.
with ctrl+c command the background process is not closed.
But with ctrl+break command the same can be closed.
if this behavior is fine, then we can apply patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-17 Thread Haribabu Kommi

On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Hello,

 The attached patches are revised ones according to the latest custom-plan
 interface patch (v11).
 The cache-scan module was re-implemented on the newer interface, and also
 I noticed the extension does not handle the tuples being redirected correctly,
 So, I revised the logic in ccache_vacuum_page() totally. It now becomes to
 synchronize the cached tuples per page, not per tuple, basic and tries to
 merge t-tree chunks per page basis also.

 Also, I split the patches again because *demonstration* part is much larger
 than the patches to the core backend. It will help reviewing.
 * pgsql-v9.4-vacuum_page_hook.v11.patch
  - It adds a hook for each page being vacuumed; that needs to synchronize
 the status of in-memory cache managed by extension.
 * pgsql-v9.4-mvcc_allows_cache.v11.patch
  - It allows to run HeapTupleSatisfiesVisibility() towards the tuples
 on the in-memory cache, not on the heap.
 * pgsql-v9.4-example-cache_scan.v11.patch
  - It demonstrates the usage of above two patches. It allows to scan
 a relation without storage access if possible.

All the patches are good. The cache scan extension patch may need
further refinement
in terms of performance improvement but the same can be handled later also.
So I am marking the patch as ready for committer. Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-12 Thread Haribabu Kommi

On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Thanks for your efforts!
  Head  patched
 Diff
 Select -  500K772ms2659ms-200%
 Insert - 400K   3429ms 1948ms  43% (I am
 not sure how it improved in this case)
 delete - 200K 2066ms 3978ms-92%
 update - 200K3915ms  5899ms-50%

 This patch shown how the custom scan can be used very well but coming to
 patch as It is having some performance problem which needs to be
 investigated.

 I attached the test script file used for the performance test.

 First of all, it seems to me your test case has too small data set that
 allows to hold all the data in memory - briefly 500K of 200bytes record
 will consume about 100MB. Your configuration allocates 512MB of
 shared_buffer, and about 3GB of OS-level page cache is available.
 (Note that Linux uses free memory as disk cache adaptively.)

Thanks for the information and a small correction. The Total number of
records are 5 million.
The select operation is selecting 500K records. The total table size
is around 1GB.

Once I get your new patch re-based on the custom scan patch, I will
test the performance
again by increasing my database size more than the RAM size. And also
I will make sure
that memory available for disk cache is less.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-12 Thread Haribabu Kommi

On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com 
 wrote:
  Enclosed is the patch to implement the requirement that issue log message 
  to
  suggest VACUUM FULL if a table is nearly empty.
 
  The requirement comes from the Postgresql TODO list.
 
 I think it would be better if we can use some existing stats to issue 
 warning
 message rather than traversing the FSM for all pages. For example after
 vacuuming page in lazy_scan_heap(), we update the freespace for page.
 You can refer below line in lazy_scan_heap().
 freespace = PageGetHeapFreeSpace(page);

 Now it might be possible that we might not get freespace info easily as
 it is not accumulated for previous vacuum's. Incase there is no viable
 way to get it through vacuum stats, we are already updating fsm after
 vacuum by FreeSpaceMapVacuum(), where I think it should be possible
 to get freespace.

 yes this way it works without extra penalty. But the problem is how to 
 calculate
 the free space which is left in the skipped pages because of visibility bit.

 One way could be by extrapolating (vac_estimate_reltuples) like we do for
 some other stats, but not sure if we can get the correct estimates. The
 main reason is that if you observe that code path, all the decisions are
 mainly done on the basis of vacrelstats. I have not checked in detail if by
 using any other stats, this purpose can be achieved, may be once you can
 look into it.

I checked the vac_estimate_reltuples() function, but not able to find
a proper way to identify the free space.

 By the way have you checked if FreeSpaceMapVacuum() can serve your
 purpose, because this call already traverses FSM in depth-first order to
 update the freespace. So may be by using this call or wrapper on this
 such that it returns total freespace as well apart from updating freespace
 can serve the need.

Thanks for information. we can get the table free space by writing some wrapper
or modify a little bit of FreeSpaceMapVacuum() function. This way it
will not add
any extra overhead in identifying the table is almost empty or not.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-11 Thread Haribabu Kommi

On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
 I will update you later regarding the performance test results.


I ran the performance test on the cache scan patch and below are the readings.

Configuration:

Shared_buffers - 512MB
cache_scan.num_blocks - 600
checkpoint_segments - 255

Machine:
OS - centos - 6.4
CPU - 4 core 2.5 GHZ
Memory - 4GB

 Head  patched  Diff
Select -  500K772ms2659ms-200%
Insert - 400K   3429ms 1948ms  43% (I am
not sure how it improved in this case)
delete - 200K 2066ms 3978ms-92%
update - 200K3915ms  5899ms-50%

This patch shown how the custom scan can be used very well but coming
to patch as It is having
some performance problem which needs to be investigated.

I attached the test script file used for the performance test.

Regards,
Hari Babu
Fujitsu Australia
shared_buffers = 512MB
shared_preload_libraries = 'cache_scan'
cache_scan.num_blocks = 600
checkpoint_segments - 255

\timing

--cache scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1  0;
select count(*) from test where f1  0;
select count(*) from test where f1  0;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1  0;
select count(*) from test where f1  0;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;


--cache scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR 
ROW EXECUTE PROCEDURE cache_scan_synchronizer();
CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT 
EXECUTE PROCEDURE cache_scan_synchronizer();

vacuum analyze test;
explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

--sequence scan select 500K
create table test(f1 int, f2 char(70), f3 float, f4 char(100));

truncate table test;
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
checkpoint;

vacuum analyze test;
set cache_scan.enabled = off;

explain select count(*) from test where f1 % 10 = 5;
select count(*) from test where f1 % 10 = 5;

delete from test where f1  100 and f1 = 120;

update test set f1 = f1 + 300 where f1  120 and f1 = 140;

insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 
'Australia software tech pvt ltd');

drop table test cascade;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty

2014-03-10 Thread Haribabu Kommi

On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote:

 On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote:
  Enclosed is the patch to implement the requirement that issue log message to
  suggest VACUUM FULL if a table is nearly empty.
 
  The requirement comes from the Postgresql TODO list.
 
  [Solution details]
 
  A check function is added in the function 'lazy_vacuum_rel' to check if the
  table is large enough and contains large numbers of unused rows. If it is
  then issue a log message that suggesting using 'VACUUM FULL' on the table.
 
  The judgement policy is as following:
 
  If the relpage of the table  RELPAGES_VALUES_THRESHOLD(default 1000) then
  the table is considered to be large enough.
 
  If the free_space/total_space  FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
  then the table is considered to have large numbers of unused rows.
 
  The free_space is calculated by reading the details from the FSM pages. This
  may increase the IO, but expecting very less FSM pages thus it shouldn't
  cause

 I think it would be better if we can use some existing stats to issue warning
 message rather than traversing the FSM for all pages. For example after
 vacuuming page in lazy_scan_heap(), we update the freespace for page.
 You can refer below line in lazy_scan_heap().
 freespace = PageGetHeapFreeSpace(page);

 Now it might be possible that we might not get freespace info easily as
 it is not accumulated for previous vacuum's. Incase there is no viable
 way to get it through vacuum stats, we are already updating fsm after
 vacuum by FreeSpaceMapVacuum(), where I think it should be possible
 to get freespace.

yes this way it works without extra penalty. But the problem is how to calculate
the free space which is left in the skipped pages because of visibility bit.

In a normal scenario, the pages which are getting skipped during vacuum process
are less in number means then this approach is a good choice.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-03-06 Thread Haribabu Kommi

On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

   4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid);
   + if (pchunk != NULL  pchunk != cchunk)
  
   + ccache_merge_chunk(ccache, pchunk);
  
   + pchunk = cchunk;
  
  
 The merge_chunk is called only when the heap tuples are spread
   across two cache chunks. Actually one cache chunk can accommodate one
   or more than heap pages. it needs some other way of handling.
  
  I adjusted the logic to merge the chunks as follows:
 
  Once a tuple is vacuumed from a chunk, it also checks whether it can be
  merged with its child leafs. A chunk has up to two child leafs; left one
  has less ctid that the parent, and right one has greater ctid. It means
  a chunk without right child in the left sub-tree or a chunk without left
  child in the right sub-tree are neighbor of the chunk being vacuumed. In
  addition, if vacuumed chunk does not have either (or both) of children,
  it can be merged with parent node.
  I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down,
  if vacuumed chunk has enough free space.
 


Patch looks good.

Regarding merging of the nodes, instead of checking whether merge is
possible or not for every tuple which is vacuumed,
can we put some kind of threshold as whenever the node is 50% free then try
to merge it from leaf nodes until 90% is full.
The rest of the 10% will be left for the next inserts on the node. what do
you say?

I will update you later regarding the performance test results.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] The behavior of CheckRequiredParameterValues()

2014-03-04 Thread Haribabu Kommi

On Wed, Mar 5, 2014 at 4:09 AM, Sawada Masahiko sawada.m...@gmail.comwrote:

 Hi all,

 I had doubts regarding behavior of CheckRequiredParameterValues() function.

 I could not start standby server which is created by pg_basebackup
 with following scenario.
 1. Start the master server with 'wal_level = archve' , 'hot_standby =
 on' and other settings of replication.
 2. Create the standby server from the master server by using pg_basebackup.
 3. Change the wal_level value of both master and standby server to
 'hot_standby'.
 4. Restarting the master server.
 5. Starting the standby server.

 In #5, I got following error even if I set wal_level to 'hot_standby'.

 FATAL:  hot standby is not possible because wal_level was not set to
 hot_standby or higher on the master server

 I tried to investigate this behaviour.
 Currently CheckRequiredParameterValues() function uses wal_level value
 which is got from ControlFile when comparing between wal_level and
 WAL_LEVEL_HOT_STANDBY as following code.

 xlog.c:6177
  if (ControlFile-wal_level  WAL_LEVEL_HOT_STANDBY)
  ereport(ERROR,
  (errmsg(hot standby is not possible because wal_level was not

 So we have to start and stop standby server with changed
 wal_level(i.g., hot_standby) if we want to enable hot standby.
 In this case, I think that the standby server didn't need to confirm
 wal_level value of ControlFile.
 I think that it should confirm value which is written in postgreql.conf.


The snapshot of running transaction information is written to WAL only when
the wal_level is set to 'hot_standby'.
This information is required on the standby side to recreate the running
transactions.

Regards,
Hari Babu
Fujitsu Australia

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-25 Thread Haribabu Kommi

On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi
kommi.harib...@gmail.comwrote:

 Thanks for the information, I will apply other patches also and start
 testing.


When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to a
maximum value or by turning off index_scan, so that the plan can chose the
cache_scan as the least cost.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024
wal_buffers - 16MB, checkpoint_segments - 255
checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or
enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts
3. Use a select query to load all the data into cache.
4. Run  a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 10;
QUERY PLAN
---
 Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
 Index Cond: (aid = 10)
 Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid = 10;
 QUERY PLAN

 Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99 rows=1
width=4)
   Filter: (aid = 10)
 Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.
please let me know if you find any problem in the test.

Regards,
Hari Babu
Fujitsu Australia


run_reading.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-24 Thread Haribabu Kommi

On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai kai...@ak.jp.nec.comwrote:

 Thanks for your testing,

  Getting some compilation warnings while compiling the extension and also
  I am not able to load the extension because of undefined symbol
  get_restriction_qual_cost.
 
 It seems to me you applied only part-1 portion of the custom-scan patches.

 The get_restriction_qual_cost() is re-defined as extern function, from
 static one, on the part-2 portion (ctidscan) to estimate cost value of the
 qualifiers in extension. Also, bms_to_string() and bms_from_string() are
 added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set
 according to the copyObject manner.

 It may make sense to include the above supplemental changes in the cache-
 scan feature also, until either of them getting upstreamed.


Thanks for the information, I will apply other patches also and start
testing.

Regards,
Hari Babu
Fujitsu Australia

[HACKERS] Bit data type header reduction in some cases

2014-02-24 Thread Haribabu Kommi

Hi,

It's regarding a Todo item of Bit data type header reduction in some
cases. The header contains two parts. 1)  The varlena header is
automatically converted to 1 byte header from 4 bytes in case of small
data. 2) The bit length header called bit_len to store the actual bit
length which is of 4 bytes in size. I want to reduce this bit_len size to 1
byte in some cases as similar to varlena header. With this change the size
of the column reduced by 3 bytes, thus shows very good decrease in disk
usage.

I am planning to the change it based on the total length of bits data. If
the number of bits is less than 0x7F then one byte header can be chosen
instead of 4 byte header. During reading of the data, the similar way it
can be calculated.

The problem I am thinking is, as it can cause problems to old databases
having bit columns when they
upgrade to a newer version without using pg_dumpall. Is there anyway to
handle this problem? Or Is there any better way to handle the problem
itself? please let me know your suggestions.

Regards,
Hari Babu
Fujitsu Australia

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-23 Thread Haribabu Kommi

On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Hello,

 The attached patch is a revised one for cache-only scan module
 on top of custom-scan interface. Please check it.


Thanks for the revised patch.  Please find some minor comments.

1. memcpy(dest, tuple, HEAPTUPLESIZE);
+ memcpy((char *)dest + HEAPTUPLESIZE,
+   tuple-t_data, tuple-t_len);

  For a normal tuple these two addresses are different but in case of
ccache, it is a continuous memory.
  Better write a comment as even if it continuous memory, it is treated as
different only.

2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple-t_len);

  t_len is already maxaligned. No problem of using it again, The required
length calculation is differing function to function.
  For example, in below part of the same function, the same t_len is used
directly. It didn't generate any problem, but it may give some confusion.

4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid);
+ if (pchunk != NULL  pchunk != cchunk)
+ ccache_merge_chunk(ccache, pchunk);
+ pchunk = cchunk;

  The merge_chunk is called only when the heap tuples are spread across two
cache chunks. Actually one cache chunk can accommodate one or more than
 heap pages. it needs some other way of handling.

4. for (i=0; i  20; i++)

   Better to replace this magic number with a meaningful macro.

5. columner is present in sgml file. correct it.

6. max_cached_attnum value in the document saying as 128 by default but
in the code it set as 256.

I will start regress and performance tests. I will inform you the same once
i finish.

Regards,
Hari Babu
Fujitsu Australia

[HACKERS] Priority table or Cache table

2014-02-19 Thread Haribabu Kommi

Hi,

I want to propose a new feature called priority table or cache table.
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

The same faster query processing can be achieved by placing the tables on a
tablespace of ram disk. In this case there is a problem of data loss in
case of system shutdown. To avoid this there is a need of continuous backup
of this tablespace and WAL files is required. The priority table feature
will solve these problems by providing the similar functionality.

User needs a careful decision in deciding how many tables which require a
faster access, those can be declared as priority tables and also these
tables should be in small in both number of columns and size.


New syntax:

create [priority] Table ...;

or

Create Table .. [ buffer_pool = priority | default ];

By adding a new storage parameter of buffer_pool to specify the type of
buffer pool this table can use.

The same can be extended for index also.


Solution -1:

This solution may not be a proper one, but it is simple. So while placing
these table pages into buffer pool, the usage count is changed to double
max buffer usage count instead of 1 for normal tables. Because of this
reason there is a less chance of these pages will be moved out of buffer
pool. The queries which operates on these tables will be faster because of
less I/O. In case if the tables are not used for a long time, then only the
first query on the table will be slower and rest of the queries are faster.

Just for test, a new bool member can be added to RELFILENODE structure to
indicate the table type is priority or not. Using this while loading the
page the usage count can be modified.

The pg_buffercache output of a priority table:

postgres=# select * from pg_buffercache where relfilenode=16385;
 bufferid | relfilenode | reltablespace | reldatabase | relforknumber |
relblocknumber | isdirty | usagecount
---+---+---+-++-+-+
   270 |16385 |   1663 |  12831 |
0 |  0 |t   | 10


Solution - 2:

By keeping an extra flag in the buffer to know whether the buffer is used
for a priority table or not? By using this flag while replacing a buffer
used for priority table some extra steps needs to be taken care like
1. Only another page of priority table can replace this priority page.
2. Only after at least two complete cycles of clock sweep, a normal table
page can replace this.

In this case the priority buffers are present in memory for long time as
similar to the solution-1, but not guaranteed always.


Solution - 3:

Create an another buffer pool called priority buffer pool similar to
shared buffer pool to place the priority table pages. A new guc parameter
called priority_buffers can be added to the get the priority buffer pool
size from the user. The Maximum limit of these buffers can be kept smaller
value to make use of it properly.

As an extra care, whenever any page needs to move out of the priority
buffer pool a warning is issued, so that user can check whether the
configured the priority_buffers size is small or the priority tables are
grown too much as not expected?

In this case all the pages are always loaded into memory thus the queries
gets the faster processing.

IBM DB2 have the facility of creating one more buffer pools and fixing
specific tables and indexes into them. Oracle is also having a facility to
specify the buffer pool option as keep or recycle.

I am preferring syntax-2 and solution-3. please provide your
suggestions/improvements.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Priority table or Cache table

2014-02-19 Thread Haribabu Kommi

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Haribabu Kommi kommi.harib...@gmail.com writes:
  I want to propose a new feature called priority table or cache table.
  This is same as regular table except the pages of these tables are having
  high priority than normal tables. These tables are very useful, where a
  faster query processing on some particular tables is expected.

 Why exactly does the existing LRU behavior of shared buffers not do
 what you need?


Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times. So if we just
separate those table pages into an another buffer
pool then all the pages of that table resides in memory and gets faster
query processing.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Priority table or Cache table

2014-02-19 Thread Haribabu Kommi

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila amit.kapil...@gmail.comwrote:

 On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
  On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   I want to propose a new feature called priority table or cache
   table.
   This is same as regular table except the pages of these tables are
   having
   high priority than normal tables. These tables are very useful, where
 a
   faster query processing on some particular tables is expected.
 
  Why exactly does the existing LRU behavior of shared buffers not do
  what you need?
 
 
  Lets assume a database having 3 tables, which are accessed regularly. The
  user is expecting a faster query results on one table.
  Because of LRU behavior which is not happening some times.

 I think this will not be a problem for regularly accessed tables(pages),
 as per current algorithm they will get more priority before getting
 flushed out of shared buffer cache.
 Have you come across any such case where regularly accessed pages
 get lower priority than non-regularly accessed pages?


Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.


 However it might be required for cases where user wants to control
 such behaviour and pass such hints through table level option or some
 other way to indicate that he wants more priority for certain tables
 irrespective
 of their usage w.r.t other tables.

 Now I think here important thing to find out is how much helpful it is for
 users or why do they want to control such behaviour even when Database
 already takes care of such thing based on access pattern.


Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-14 Thread Haribabu Kommi

On Feb 15, 2014 9:19 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Haribabu Kommi escribió:

  I changed the balance cost calculations a little bit to give priority to
  the user provided per table autovacuum parameters.
  If any user specified per table vacuum parameters exists and those are
  different with guc vacuum parameters then the
  balance cost calculations will not include that worker in calculation.
Only
  the cost is distributed between other workers
  with specified guc vacuum cost parameter.
 
  The problem in this calculation is if the user provides same guc values
to
  the per table values also then it doesn't consider them in calculation.

 I think this is a strange approach to the problem, because if you
 configure the backends just so, they are completely ignored instead of
 being adjusted.  And this would have action-at-a-distance consequences
 because if you change the defaults in postgresql.conf you end up with
 completely different behavior on the tables for which you have carefully
 tuned the delay so that they are ignored in rebalance calculations.

 I think that rather than ignoring some backends completely, we should be
 looking at how to weight the balancing calculations among all the
 backends in some smart way that doesn't mean they end up with the
 default values of limit, which AFAIU is what happens now -- which is
 stupid.  Not real sure how to do that, perhaps base it on the
 globally-configured ratio of delay/limit vs. the table-specific ratio.

 What I mean is that perhaps the current approach is all wrong and we
 need to find a better algorithm to suit this case and more generally.
 Of course, I don't mean to say that it should behave completely
 differently than now in normal cases, only that it shouldn't give
 completely stupid results in corner cases such as this one.

 As an example, suppose that global limit=200 and global delay=20 (the
 defaults).  Then we have a global ratio of 5.  If all three tables being
 vacuumed currently are using the default values, then they all have
 ratio=5 and therefore all should have the same limit and delay settings
 applied after rebalance.  Now, if two tables have ratio=5 and one table
 has been configured to have a very fast vacuum, that is limit=1,
 then ratio for that table is 1/20=500.  Therefore that table should
 be configured, after rebalance, to have a limit and delay that are 100
 times faster than the settings for the other two tables.  (And there is
 a further constraint that the total delay per limit unit should be
 so-and-so to accomodate getting the correct total delay per limit unit.)

 I haven't thought about how to code that, but I don't think it should be
 too difficult.  Want to give it a try?  I think it makes sense to modify
 both the running delay and the running limit to achieve whatever ratio
 we come up with, except that delay should probably not go below 10ms
 because, apparently, some platforms have that much of sleep granularity
 and it wouldn't really work to have a smaller delay.

 Am I making sense?

Yes makes sense and it's a good approach also not leaving the delay
parameter as is. Thanks I will give a try.

Regards,
Hari Babu
Fujitsu Australia

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-12 Thread Haribabu Kommi

On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
  7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups);
 can
  go wrong when only one tuple present in the block
 with the equal item pointer what we are searching in the forward scan
  direction.
 
 It shouldn't happen, because the first or second ItemPointerCompare will
 handle the condition. Please assume the cchunk-ntups == 1. In this case,
 any given ctid shall match either of them, because any ctid is less, equal
 or
 larger to the tuple being only cached, thus, it moves to the right or left
 node
 according to the scan direction.


yes you are correct. sorry for the noise.


   8. I am not able to find a protection mechanism in insert/delete and
 etc of
  a tuple in Ttree. As this is a shared memory it can cause problems.
 
 For design simplification, I put a giant lock per columnar-cache.
 So, routines in cscan.c acquires exclusive lwlock prior to invocation of
 ccache_insert_tuple / ccache_delete_tuple.


Correct. But this lock can be a bottleneck for the concurrency. Better to
analyze the same once we have the performance report.

Regards,
Hari Babu
Fujitsu Australia

Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-11 Thread Haribabu Kommi

On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Hello,

 Because of time pressure in the commit-fest:Jan, I tried to simplifies the
 patch
 for cache-only scan into three portions; (1) add a hook on heap_page_prune
 for cache invalidation on vacuuming a particular page. (2) add a check to
 accept
 InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only
 scan.

 (1) pgsql-v9.4-heap_page_prune_hook.v1.patch
 Once on-memory columnar cache is constructed, then it needs to be
 invalidated
 if heap page on behalf of the cache is modified. In usual DML cases,
 extension
 can get control using row-level trigger functions for invalidation,
 however, we right
 now have no way to get control on a page is vacuumed, usually handled by
 autovacuum process.
 This patch adds a callback on heap_page_prune(), to allow extensions to
 prune
 dead entries on its cache, not only heap pages.
 I'd also like to see any other scenario we need to invalidate columnar
 cache
 entries, if exist. It seems to me object_access_hook makes sense to conver
 DDL and VACUUM FULL scenario...


 (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
 In case when we want to check visibility of the tuples on cache entries
 (thus
 no particular shared buffer is associated) using
 HeapTupleSatisfiesVisibility,
 it internally tries to update hint bits of tuples. However, it does
 not make sense
 onto the tuples being not associated with a particular shared buffer.
 Due to its definition, tuple entries being on cache does not connected with
 a particular shared buffer. If we need to load whole of the buffer page to
 set
 hint bits, it is totally nonsense because the purpose of on-memory cache is
 to reduce disk accesses.
 This patch adds an exceptional condition on SetHintBits() to skip anything
 if the given buffer is InvalidBuffer. It allows to check tuple
 visibility using regular
 visibility check functions, without re-invention of the wheel by
 themselves.


 (3) pgsql-v9.4-contrib-cache-scan.v1.patch
 Unlike (1) and (2), this patch is just a proof of the concept to
 implement cache-
 only scan on top of the custom-scan interface.
 It tries to offer an alternative scan path on the table with row-level
 triggers for
 cache invalidation if total width of referenced columns are less than 30%
 of the
 total width of table definition. Thus, it can keep larger number of
 records with
 meaningful portion on the main memory.
 This cache shall be invalidated according to the main heap update. One is
 row-level trigger, second is object_access_hook on DDL, and the third is
 heap_page_prune hook. Once a columns reduced tuple gets cached, it is
 copied to the cache memory from the shared buffer, so it needs a feature
 to ignore InvalidBuffer for visibility check functions.


I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

1. +# contrib/dbcache/Makefile

   Makefile header comment is not matched with file name location.

2.+   /*
+   * Estimation of average width of cached tuples - it does not make
+   * sense to construct a new cache if its average width is more than
+   * 30% of the raw data.
+   */

   Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
   otherwise it is an overhead for cache hit scenario.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, old_cache-attrs_used);

   can't we need the check to see the average width is more than 30%?
During estimation it doesn't
   include the existing other attributes.

4. + lchunk-right = cchunk;
+ lchunk-l_depth = TTREE_DEPTH(lchunk-right);

  I think it should be lchunk-r_depth needs to be set in a clock wise
rotation.

5. can you add some comments in the code with how the block is used?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

   Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
   in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
   Just need to sort this array based on item pointers, no need to update
their reference pointers.

   In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
   free space is not available to insert the new tuple. you can decide
based on how frequent the sorting will happen in general.

7. In ccache_find_tuple function this Assert(i_min + 1  cchunk-ntups);
can go wrong when only one tuple present in the block
   with the equal item pointer what we are searching in the forward scan
direction.

8. I am not able to find a protection mechanism in insert/delete and etc of
a tuple in Ttree. As this is a shared

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2014-02-09 Thread Haribabu Kommi

On Sat, Feb 8, 2014 at 12:10 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 1/29/14, 7:37 PM, Haribabu Kommi wrote:
 
  On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote:
 
  On 11/30/13, 6:59 AM, Haribabu kommi wrote:
   To detect provided data and xlog directories are same or not, I
  reused the
   Existing make_absolute_path() code as follows.
 
  I note that initdb does not detect whether the data and xlog
 directories
  are the same.  I think there is no point in addressing this only in
  pg_basebackup.  If we want to forbid it, it should be done in initdb
  foremost.
 
   Thanks for pointing it. if the following approach is fine for
  identifying the identical directories
   then I will do the same for initdb also.

 I wouldn't bother.  It's a lot of work for little benefit.  Any mistake
 a user would make can easily be fixed.


I also felt a lot of work for little benefit but as of now I am not able to
find an easier solution to handle this problem.
can we handle the same later if it really requires?

-- 
Regards,
Hari Babu
Fujitsu Australia Software Technology

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2014-01-29 Thread Haribabu Kommi

On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote:

 On 11/30/13, 6:59 AM, Haribabu kommi wrote:
  To detect provided data and xlog directories are same or not, I reused
 the
  Existing make_absolute_path() code as follows.

 I note that initdb does not detect whether the data and xlog directories
 are the same.  I think there is no point in addressing this only in
 pg_basebackup.  If we want to forbid it, it should be done in initdb
 foremost.


 Thanks for pointing it. if the following approach is fine for identifying
the identical directories
 then I will do the same for initdb also.


 I'm not sure it's worth the trouble, but if I were to do it, I'd just
 stat() the two directories and compare their inodes.  That seems much
 easier and more robust than comparing path strings


stat() is having problems in windows, because of that reason the patch is
written to identify the directories with string comparison.

Regards,
Hari Babu

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-12-20 Thread Haribabu kommi

On 20 December 2013 02:02 Bruce Momjian wrote:
 On Thu, Dec 19, 2013 at 05:14:50AM +, Haribabu kommi wrote:
  On 19 December 2013 05:31 Bruce Momjian wrote:
   On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote:
The make_absolute_path() function moving to port is changed in
   similar
way as Bruce Momjian approach. The psprintf is used to store the
   error
string which Occurred in the function. But psprintf is not used
for storing the absolute path As because it is giving problems in
freeing
   the allocated memory in SelectConfigFiles.
Because the same memory is allocated in a different code branch
from
   guc_malloc.
   
After adding the make_absolute_path() function with psprintf
 stuff
in path.c file It is giving linking problem in compilation of
ecpg. I am
   not able to find the problem.
So I added another file abspath.c in port which contains these
 two
   functions.
  
   What errors are you seeing?
 
  If I move the make_absolute_path function from abspath.c to path.c, I
  was getting following linking errors while compiling ecpg.
 
  ../../../../src/port/libpgport.a(path.o): In function
 `make_absolute_path':
  /home/hari/postgres/src/port/path.c:795: undefined reference to
 `psprintf'
  /home/hari/postgres/src/port/path.c:809: undefined reference to
 `psprintf'
  /home/hari/postgres/src/port/path.c:818: undefined reference to
 `psprintf'
  /home/hari/postgres/src/port/path.c:830: undefined reference to
 `psprintf'
  collect2: ld returned 1 exit status
  make[4]: *** [ecpg] Error 1
  make[3]: *** [all-preproc-recurse] Error 2
  make[2]: *** [all-ecpg-recurse] Error 2
  make[1]: *** [all-interfaces-recurse] Error 2
  make: *** [all-src-recurse] Error 2
 
 You didn't show the actual command that is generating the error, but I
 assume it is linking ecpg, not creating libecpg.  I think the issue is
 that path.c is specially handled when it is included in libecpg.  Here
 is a comment from the libecpg Makefile:
 
   # We use some port modules verbatim, but since we need to
   # compile with appropriate options to build a shared lib, we
 can't
   # necessarily use the same object files as the backend uses.
 Instead,
   # symlink the source files in here and build our own object file.
 
 My guess is that libecpg isn't marked as linking to libpgcommon, and
 when you called psprintf in path.c, it added a libpgcommon link
 requirement.
 
 My guess is that if you compiled common/psprintf.c like port/path.c in
 libecpg's Makefile, it would link fine.

Sorry for not mentioning the command, yes it is giving problem while linking 
ecpg.

From the compilation I observed as libpgcommon is linking while building ecpg.
I tested the same by using psprintf directly in ecpg code.

I modified the libecpg's Makefile as suggested by you which is attached in the 
mail,
Still the errors are occurring. Please let me know is there anything missed?

Regards,
Hari babu.




make_abs_path_v3.patch
Description: make_abs_path_v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 1 2 3 4 5 6 >

401 - 500 of 544 matches

Mail list logo