[HACKERS] ExecuteTruncate quirk: expects a unique list of relations

2008-06-05 Thread Nikhils
Hi,

Consider this simple case:

postgres=# TRUNCATE foo, foo;
ERROR:  cannot TRUNCATE foo because it is being used by active queries in
this session

The above occurs because the ExecuteTruncate() function invokes
truncate_check_rel() in a loop. Since the same table name appears twice, the
rd_refcnt for table foo is bumped up to 2, causing the above failure.

We might want to add a step to ExecuteTruncate(), or whatever calls it, to
make the list unique.

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] plpgsql: penalty due to double evaluation of parameters

2008-05-21 Thread Nikhils
Hi,

I don't buy the performance argument unless I see some test results
 demonstrating it; exec_prepare_plan is only called on the first invocation
 of a statement. What kind of side-effects could exec_eval_datum call have?



Note that I have avoided using the performance word for this very reason.
But consider for example when the datum type is PLPGSQL_DTYPE_REC. I dont
think its justified to have the overhead of heap_copytuple_with_tuple, when
all we need is just the typeid! Similar arguments apply for other datums
like PLPGSQL_DTYPE_ROW, PLPGSQL_DTYPE_TRIGARG e.g.

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


[HACKERS] plpgsql: penalty due to double evaluation of parameters

2008-05-20 Thread Nikhils
Hi,

Within exec_prepare_plan there are calls to exec_eval_datum to identify the
argtypes of the involved parameters. However exec_eval_datum actually fills
up value, isnull entries in these cases causing unnecessary additional calls
when all we need is the datum type. Such unnecessary evaluation of values
might prove to be very costly later since this quirk of exec_eval_datum
usage is not so visible. Worse still it could cause bugs if some evaluations
have side-effects across multiple evals. It might make sense to introduce a
new function exec_eval_datum_type to address this or exec_eval_datum could
itself be modified for cases where we just need the datum type. Should I
cook up a patch for this? I am inclined towards introducing a new function
(but that means that any new datum related changes need to be carried out in
2 functions instead of one currently).

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] [PATCHES] [EMAIL PROTECTED]: Re: [BUGS] Problem identifying constraints which should not be inherited]

2008-05-11 Thread Nikhils
Hi,
On Sat, May 10, 2008 at 6:11 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:

 On Fri, May 9, 2008 at 5:37 PM, Tom Lane [EMAIL PROTECTED] wrote:
  Alex Hunsaker [EMAIL PROTECTED] writes:
  [ patch to change inherited-check-constraint behavior ]
 
  Applied after rather heavy editorializations.  You didn't do very well on
  getting it to work in multiple-inheritance scenarios, such as
 
 create table p (f1 int check (f10));
 create table c1 (f2 int) inherits (p);
 create table c2 (f3 int) inherits (p);
 create table cc () inherits (c1,c2);
 
  Here the same constraint is multiply inherited.  The base case as above
  worked okay, but adding the constraint to an existing inheritance tree
  via ALTER TABLE, not so much.

 Ouch. Ok Ill (obviously) review what you committed so I can do a lot
 better next time.
 Thanks for muddling through it!



Ouchie indeed!

 I'm not sure if we ought to try to back-patch that --- it'd be a
 behavioral change with non-obvious implications.  In the back branches,
 ADD CHECK followed by DROP CONSTRAINT will end up not deleting the
 child-table constraints, which is probably a bug but I wouldn't be
 surprised if applications were depending on the behavior.

Given the lack complaints it does not seem worth a back patch IMHO.

Yeah, same IMHO. I do hope we have covered things properly for inherited
check constraints by now. One minor thing that myself and Alex discussed was
the usage of child tables in tablecmds.c, especially in error messages.
Again English is not my native language, but shouldn't that be worded as
children tables? Admittedly even this does not sound any better than
child tables though :). It is nit-picking really, but I can submit a
cleanup patch to reword this if the list thinks so..

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Lessons from commit fest

2008-04-16 Thread NikhilS
Hi,

  The idea that we fix stylistic issues on the fly is not sustainable.
   We should offer help and mentorship to new patch submitters in all
   areas (including stylistic) but they should do the work. It is the only
   way we will mold them to submit patches in the proper way.
 

 I agree.  As a submitter I would much rather get an email saying e.g.
 Hey, your patch is nice but the code style sticks out like a sore
 thumb.  Please adopt surrounding naming convention and fix your
 indentation per the rules at [link]. than have it fixed silently on
 its way to being committed.

 With the former I learn something and get to improve my own work.
 With the latter, my next patch is probably going to have the exact
 same problem, which is in the long term just making extra work for the
 reviewers.


I think, us patch-submitters should be asked to do a run of pg_indent on the
files that we have modified. That should take care of atleast the
indentation related issues. I looked at the README of src/tools/pgindent,
and it should be easy to run enough (or is it not?). Only one thing that
caught my eye was:

1) Build the source tree with _debug_ symbols and all possible configure
options

Can the above point be elaborated further? What all typical and possible
configure options should be used to get a clean and complete pg_indent run?

And I think adopting surrounding naming, commeting, coding conventions
should come naturally as it can aide in copy-pasting too :)

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] COPY Transform support

2008-04-03 Thread NikhilS
Hi,


On Thu, Apr 3, 2008 at 6:47 PM, Dimitri Fontaine [EMAIL PROTECTED]
wrote:

 Here's a proposal for COPY to support the T part of an ETL, that is adding
 the
 capability for COPY FROM to Transform the data it gets.

 The idea is quite simple: adding to COPY FROM the option to run a function
 on
 the data before to call datatype_in functions. This needs some syntax
 addition to be worked out at the COPY side, then the COPY code will have
 to
 run the given function on the read data and consider giving the output of
 it
 to current COPY code (datatype input function).

 The function could either get the data as text or bytea, and would have to
 return either text or bytea. bytea seems the more sensible choice, as long
 as
 we don't lose encoding information there, which I'm not sure about.

 The syntax could be something like:
  COPY mytable FROM '/my/file.txt' WITH COLUMN x CONVERT USING myfunc;

 I tried to only add keywords already present in [1], while getting
 something
 meaningfull... and x is intended to be the column number, counting from 1.
  [1] http://www.postgresql.org/docs/8.3/static/sql-keywords-appendix.html

 Comments?
 --
 dim


+1

Data transformation while doing a data load is a requirement now and then.
Considering that users will have to do mass updates *after* the load
completes to mend the data to their liking should be reason enough to do
this while the loading is happening. I think to go about it the right way we
should support the following:


* The ability to provide per-column transformation expressions

COPY mytable (col1 transform to col1 + 10, col2 transform to 'Post' ||
'greSQL', col3...) FROM ..

* The ability to use any kind of expressions while doing the transformation
The transformation expression should be any expression (basically
ExecEvalExpr) that can be evaluated to give a resulting value and obviously
a corresponding is_null value too. It should and could be system in-built
functions (e.g. UPPER, TRIM, TO_CHAR, TO_NUMBER etc.) or user defined
functions too

* The transformation expression can refer to other columns involved in the
load. So that when the current row is extracted from the input file, the
current values should be used to generate the new resultant values before
doing a heap_form_tuple. E.g.

(col1 transform col1 + 10, col2 transform col1 * col2, col3 transform
UPPER(col1 || col3),...)

I have spent some thoughts on how to do this and will be happy to share the
same if the list is interested. Personally, I think data transformation
using such expressions is a pretty powerful and important activity while
doing the data load itself.

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] [BUGS] Problem identifying constraints which should not be inherited

2008-04-02 Thread NikhilS
Hi Alvaro

On Fri, Mar 28, 2008 at 6:05 PM, Alvaro Herrera [EMAIL PROTECTED]
wrote:

 NikhilS escribió:

  I will take a look at the pg_dump related changes if you want. We will
 need
  changes in flagInhAttrs() and in getTableAttrs() to query the backend
 for
  these 2 attributes for post 80300 versions.

 Oh, BTW, I have not added this patch to any Commitfest page on the wiki,
 since it has obvious things that need more work.  If you do work on
 them, please post the improved patch later and add it to the
 corresponding Commitfest, as appropriate.


I submitted the combined latest patch to the patches list yesterday. How can
I add it to the commitfest (presumably to the May one?)? Please let me know.


Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] [PATCHES] [EMAIL PROTECTED]: Re: [BUGS] Problem identifying constraints which should not be inherited]

2008-03-31 Thread NikhilS
Hi Alex,

On Sun, Mar 30, 2008 at 7:10 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:

 (trimmed cc's)

 Find attached inherited_constraint_v2.patch

 Changes since v1:
 -rebased against latest HEAD
 -changed enum { Anum_pg_constraint_... } back into #define
 Anum_pg_constraint_...
 -remove whitespace damage I added
 -fixed regression tests I added to be more robust
 -fixed
   create table ac (a int constraint check_a check (a  0));
   create table bc (a int constraint check_a check (a  0)) inherits (ac);
   so it properly works (removed crud I put into
 AddRelationRawConstraints and created a proper fix in DefineRelation)


I was taking a look at this patch to add the pg_dump related changes. Just
wanted to give you a heads up as this patch crashes if we run make
installcheck. Seems there is an issue introduced in the CREATE TABLE
REFERENCES code path due to your patch (this is without my pg_dump changes
just to be sure).  Looks like some memory overwrite issue. The trace is as
follows:

Core was generated by `postgres: nikhils regression [local] CREATE
TABLE '.
Program terminated with signal 11, Segmentation fault.
#0  0x08378024 in AllocSetCheck (context=0xa060368) at aset.c:1112
1112if (dsize  0  dsize  chsize 
*chdata_end != 0x7E)
(gdb) bt
#0  0x08378024 in AllocSetCheck (context=0xa060368) at aset.c:1112
#1  0x0837704f in AllocSetDelete (context=0xa060368) at aset.c:487
#2  0x083783c2 in MemoryContextDelete (context=0xa060368) at mcxt.c:196
#3  0x083797fb in PortalDrop (portal=0xa0845bc, isTopCommit=0 '\0') at
portalmem.c:448
#4  0x08281939 in exec_simple_query (
query_string=0xa07e564 CREATE TABLE enumtest_child (parent rainbow
REFERENCES enumtest_parent);) at postgres.c:992
#5  0x082857d4 in PostgresMain (argc=4, argv=0x9ffbe28, username=0x9ffbcc4
nikhils) at postgres.c:3550
#6  0x0824917b in BackendRun (port=0xa003180) at postmaster.c:3204
#7  0x082486a2 in BackendStartup (port=0xa003180) at postmaster.c:2827
#8  0x08245e9c in ServerLoop () at postmaster.c:1271
#9  0x082457fd in PostmasterMain (argc=3, argv=0x9ff9c60) at postmaster.c
:1019
#10 0x081e1c03 in main (argc=3, argv=0x9ff9c60) at main.c:188


Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] [BUGS] Problem identifying constraints which should not be inherited

2008-03-28 Thread NikhilS
Hi Alex,

On Fri, Mar 28, 2008 at 4:58 AM, Alex Hunsaker [EMAIL PROTECTED] wrote:

 On Thu, Mar 27, 2008 at 5:14 AM, NikhilS [EMAIL PROTECTED] wrote:
  * Add logic to disallow ADD CONSTRAINT ONLY to parent of an inheritance
  hierarchy
 
  * Add logic to mark inherited constraints in the children:
  This can be achieved by introducing a new bool coninherited attribute
 in
  pg_constraint. This will be set to true on only those check constraints
 that
  are added to children via the inheritance mechanism
 
  * Add logic to disallow dropping inherited constraints directly on
 children
  Obviously they will get dropped if a DROP CONSTRAINT is fired on the
 parent.
  with recurse set to true (this is the default behaviour)
 
   * Modify the pg_dump logic to use the new pg_constraint based attribute
  logic for version 80300 (or should it be PG_VERSION_NUM 80400?) onwards.
  Infact the current logic to determine if a check constraint is inherited
 is
  to compare its name with the parent constraint name and the comment
 already
  mentions that we should make changes in pg_constraint to avoid this
  rudimentary way of determing the inheritance :).
 
 
 Attached is a WIP patch I have been playing with in my spare time.  It
 should take care of the first 2.  It does nothing for pg_dump or set
 (not) null/set default.

 Note it has some gross points (see comment in
 src/backend/catalog/heap.c AddRelationRawConstraints) and the
 regression tests I added are not quite up to par (and probably a bit
 redundant).  But in the interest of saving work I thought i would post
 it.


I took a quick look and it seems to be on the lines of attislocal and
attinhcount which is a good thing. I am not sure about your syscache related
changes though and also about using enums for pg_constraint attributes which
deviates from other catalog specifications. Its good that you posted your
WIP here immediately or it would have caused duplication of efforts from my
side :)

I will take a look at the pg_dump related changes if you want. We will need
changes in flagInhAttrs() and in getTableAttrs() to query the backend for
these 2 attributes for post 80300 versions.

P.S Alvaro, I think this patch did not reach the mailing list and was
stalled due to size restrictions or something.

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] [BUGS] Problem identifying constraints which should not be inherited

2008-03-27 Thread NikhilS
Hi,

On Thu, Mar 20, 2008 at 7:36 PM, Tom Lane [EMAIL PROTECTED] wrote:


 More to the point, it takes a capability away from the user without
 actually solving the problem we need to solve, namely to guarantee
 consistency between parent and child constraints.  You can be sure
 that there is someone out there who will complain that we've broken
 his application when we disallow this, and we need to be able to
 point to some positive benefit we got from it.


Agreed. So I think we need to implement the whole enchilada or nothing at
all. We need to do the following for this:

* Add logic to disallow ADD CONSTRAINT ONLY to parent of an inheritance
hierarchy

* Add logic to mark inherited constraints in the children:
This can be achieved by introducing a new bool coninherited attribute in
pg_constraint. This will be set to true on only those check constraints that
are added to children via the inheritance mechanism

* Add logic to disallow dropping inherited constraints directly on children
Obviously they will get dropped if a DROP CONSTRAINT is fired on the parent.
with recurse set to true (this is the default behaviour)

* Modify the pg_dump logic to use the new pg_constraint based attribute
logic for version 80300 (or should it be PG_VERSION_NUM 80400?) onwards.
Infact the current logic to determine if a check constraint is inherited is
to compare its name with the parent constraint name and the comment already
mentions that we should make changes in pg_constraint to avoid this
rudimentary way of determing the inheritance :).

Am important decision here is about adding a new attribute to pg_constraint
as it is the only sane way of determining inherited constraints, but that
will require an initdb. Comments?

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] MemoryContextSwitchTo() confusion

2008-03-20 Thread NikhilS
src/backend/utils/mmgr/README contains more information about the same too.

Regards,
Nikhils

On Thu, Mar 20, 2008 at 2:41 PM, Pavan Deolasee [EMAIL PROTECTED]
wrote:

 On Thu, Mar 20, 2008 at 12:27 AM, Dan Searle [EMAIL PROTECTED] wrote:

 
   I had to fiddle about with switching memory contexts rather a lot to
   make it work this far, but I'm only guessing as to when it's
 appropriate
   to call MemoryContextSwitchTo(), and to which context to switch to

 Here is what I know. Not sure whether it would answer your question
 though.

 A memory context is a memory enclosure with a life associated to it.
 Whenever
 the context is freed, all objects allocated in that context are also
 automatically
 freed. If there are any references to these objects, they will turn
 into dangling
 pointers, causing segfaults and/or memory corruption. So you need to be
 careful while choosing a memory context to allocate memory from. You
 don't want to allocate something in a long-lived context if you don't need
 it for that much time because failure to explicitely free the allocation
 will
 result in memory consumption when its not required or even a memory leak.
 OTOH you don't want to allocate something in a very short-live context,
 if you may require that object outside the scope of that context.

 Certain memory contexts are well known. For example, a TopMemoryContext
 has life of the session. Any object allocated in this context would remain
 valid for the entire session. Of course, failure to free them would result
 in memory leaks.

 TopTransactionContext, as the name suggests, is valid in the current
 top transaction. As soon as the transaction commits/aborts, the context
 is freed.

 CurrentTransactionContext, which may be same as TopTransactionContext
 remains valid for the current transaction or subtransaction.

 Apart from that, there are contexts attached to different objects during
 execution and their lifespan is usually attached to the lifespan of the
 object
 itself. You may need to choose one of them if you know that what you
 are allocating can not or should not outlive that object.


 Thanks,
 Pavan

 --
 Pavan Deolasee
 EnterpriseDB http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS]COPY issue(gsoc project)

2008-03-14 Thread NikhilS
Hi Longlong,


  i think this is a better idea.
 from *NikhilS *
 http://archives.postgresql.org/pgsql-hackers/2007-12/msg00584.php
 But instead of using a per insert or a batch insert substraction, I am
 thinking that we can start off a subtraction and continue it till we
 encounter a failure. The moment an error is encountered, since we have the
 offending (already in heap) tuple around, we can call a simple_heap_delete
 on the same and commit (instead of aborting) this subtransaction after doing
 some minor cleanup. This current input data row can also be logged into a
 bad file. Recall that we need to only handle those errors in which the
 simple_heap_insert is successful, but the index insertion or the after row
 insert trigger causes an error. The rest of the load then can go ahead with
 the start of a new subtransaction.
 the simplest thing are often the best.
 i think it's hard to implement or some other deficiency since you want
 subtransaction or every n rows.



Yeah simpler things are often the best, but as folks are mentioning, we need
a carefully thought out approach here. The reply from Tom to my posting
there raises issues which need to be taken care of. Although I still think
that if we carry out *sanity* checks before starting the load about presence
of triggers, constrainsts, fkey constraints etc, if others do not have any
issues with the approach, the simple_heap_delete idea should work in some
cases. Although the term I used after some minor cleanup might need some
thought too now that I think more of it..

Also if Fkey checks or complex triggers are around, maybe we can fall back
to a subtransaction per row insert too as a worse case measure..

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Declarative partitioning grammar

2008-01-12 Thread NikhilS
Hi,

 I've proposed an alternative approach, which we've called declarative
 partitioning which is grammar based. This grammar was developed by Jeff
 Cohen at Greenplum with some assistance from myself. It is to be
 completely open source.


..
FWIW, I had done some very initial work on declarative partitioning (no
where as exhaustive as this proposal) and submitted a wip patch here:

http://momjian.us/mhonarc/patches_hold/msg6.html

Kindly take a look at the patch, to see if would be useful to you folks in
any way.
..

Range
-

Range has the most expressive grammar. I'll introduce it in steps:

...PARTITION BY RANGE (b)
(
PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
);

It is common that these partitions follow a pattern, such as following
every week, month or year. So, we support the following specification:

...   PARTITION BY RANGE(order_date)
 (
   START (date '2005-12-01') end (date '2007-12-01')
  EVERY(interval '2 months')
 );
..
It will be interesting to see how this start,end, interval usage accomodates
data types other than dates. I hope, this specification is not
influenced overlty just by dates-like partitions.
..

ADD
---

For range and list partitioning, it's important to be able to add
partitions for data not covered by the existing specification. So, we
propose:

...  ADD PARTITION q1_2008 end (date '2008-04-01')
..
What about data that does not match any existing partition specification? It
might make sense to have a dummy partition which handles all these cases.
..

DROP


For list and range partitions, drop a specified partition from the set
of partitions.

... DROP PARTITION minny;

This drops a named partition. Often, it will be difficult for users to
know partition names, and they might be unnamed. So, we allow this
syntax:

... DROP PARTITION FOR(date '2007-01-01');

for range partitions; and:

... DROP PARTITION FOR(VALUES('CA'));

for list partitions.

We've also discussed something like:

... DROP PARTITION FOR(POSITION(1));

so that users can easily drop a specific partition in an array of range
partitions. It seems to me, though, that the use case is generally to
drop the oldest partition so perhaps we should have a more explicit
syntax. Thoughts?
..
Surely, the partitions will get (default, parent inferred) names when they
get created? Do we expect the users to remember FOR() specifications like
the ones mentioned above? It might make sense to have a \d in psql e.g to
present a parent with all its named partitions alongwith the partition
clauses to facilitate drop partition using partition names.
..

EXCHANGE


This sub-clause allows us to make a table a partition in a set of
partitions or take a partition out of a set but keep it as a table. IBM
uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
explain the latter:

... EXCHANGE partition identifier WITH TABLE table name

partition identifier is one of PARTITION name or PARTITION FOR(...).
The partition in the partition set 'becomes' the table table name and
vice-versa. Essentially, we'd swap the relfilenodes. This means that we
have to first ADD PARTITION then swap the table and the partition.
Thoughts?
..
Surely this wont be instantaneous?
..

Regards,
Nikhils

-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Declarative partitioning grammar

2008-01-12 Thread NikhilS
Hi,

 We did look at allowing general functions for partitioning and this
 was one concern.  The other is that we want to enforce that a row
 only gets inserted into a single partition, so we wanted a
 declarative syntax where it was relatively easy to check that range
 and list specifications don't overlap.


Detection of mutually exclusive ranges might not turn out to be so easy
afterall. I think there is some code in the constraint_exclusion area which
might help out in this.

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Declarative partitioning grammar

2008-01-12 Thread NikhilS
Hi,



On Jan 12, 2008 6:29 AM, Gavin Sherry [EMAIL PROTECTED] wrote:



 The syntax is half the problem, performance is the other. I will bring
 the performance issues up in another thread. Yes, we are confident that
 we can address the performance issues that rule out the existing
 partitioning for many applications. We need it for our own stuff! :P



Agreed, syntax is just the sugar.
Also other than performance, how are updates involving partition keys
causing the resultant tuple to end up in a new partition handled here?

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] duplicate columns with COPY

2007-12-20 Thread NikhilS
Hi,

 
  pei=# copy test1 (a, a) to stdout;
  ERROR:  42701: column a specified more than once
 
  Or is this just an overly extensive check that is actually intended for
 COPY
  FROM STDIN?
 
 


This seems to be a common check in both COPY TO and COPY FROM cases
source/destination being STDIN or otherwise. While it definitely makes sense
for the FROM case maybe we could relax this for the COPY TO case.

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] VLDB Features

2007-12-16 Thread NikhilS
Hi,

On Dec 15, 2007 1:14 PM, Tom Lane [EMAIL PROTECTED] wrote:

 NikhilS [EMAIL PROTECTED] writes:
  Any errors which occur before doing the heap_insert should not require
  any recovery according to me.

 A sufficient (though far from all-encompassing) rejoinder to that is
 triggers and CHECK constraints can do anything.

  The overhead of having a subtransaction per row is a very valid concern.
 But
  instead of using a per insert or a batch insert substraction, I am
  thinking that we can start off a subtraction and continue it till we
  encounter a failure.The moment an error is encountered, since we have
 the offending (already in heap) tuple around, we can call a
 simple_heap_delete on the same and commit (instead of aborting) this
 subtransaction

 What of failures that occur only at (sub)transaction commit, such as
 foreign key checks?


What if we identify and define a subset where we could do subtransactions
based COPY? The following could be supported:

* A subset of triggers and CHECK constraints which do not move the tuple
around. (Identifying this subset might be an issue though?)
* Primary/unique key indexes

As Hannu mentioned elsewhere in this thread, there should not be very many
instances of complex triggers/CHECKs around? And  may be in those instances
(and also the foreign key checks case), the behaviour could default to use a
per-subtransaction-per-row or even the existing single transaction model?

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] VLDB Features

2007-12-14 Thread NikhilS
Hi,


 Another approach would be to distinguish between errors that require a
 subtransaction to recover to a consistent state, and less serious errors
 that don't have this requirement (e.g. invalid input to a data type
 input function). If all the errors that we want to tolerate during a
 bulk load fall into the latter category, we can do without
 subtransactions.


I think errors which occur after we have done a fast_heap_insert of the
tuple generated from the current input row are the ones which would require
the subtransaction to recover. Examples could be unique/primary key
violation errors or FKey/triggers related errors. Any errors which occur
before doing the heap_insert should not require any recovery according to
me.

The overhead of having a subtransaction per row is a very valid concern. But
instead of using a per insert or a batch insert substraction, I am
thinking that we can start off a subtraction and continue it till we
encounter a failure. The moment an error is encountered, since we have the
offending (already in heap) tuple around, we can call a simple_heap_delete
on the same and commit (instead of aborting) this subtransaction after doing
some minor cleanup. This current input data row can also be logged into a
bad file. Recall that we need to only handle those errors in which the
simple_heap_insert is successful, but the index insertion or the after row
insert trigger causes an error. The rest of the load then can go ahead with
the start of a new subtransaction.

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] [BUGS] BUG #3811: Getting multiple values from a sequence generator

2007-12-10 Thread NikhilS
Hi,

  Right, I want to use it with a bulk operation, say importing a million
 records
  with COPY. Calling nextval one million times looks to me like an
 enormous waste
  of resources. Suppose, you are on an ADSL line: it will cost one million
 times
  the ping time of the ADSL line (say 10 milliseconds per call). Well OK,
 one
  could write a server function that does this, but then the one million
 result
  values must be transported back to the client, because they are not
 guaranteed
  to be contiguous. Unneeded complexity compared to a simple nextval
 increment
  parameter.

 The usual way to use nextval() is to use it on the server as an expression
 in
 an INSERT or DEFAULT. If you're using COPY and don't have a column default
 set
 up then, hm, I guess you're kind of stuck. That would make a good use case
 for
 a one-time nextval(increment) or something like that.


Coincidently, I very briefly discussed (offline) about supporting
expressions while doing loads using COPY FROM with Heikki a while back. From
the above mail exchanges, it does appear that adding this kind of
functionality will be useful while doing bulk imports into tables using
COPY.

Heikki's initial suggestion was as follows:

COPY table FROM file USING query

Where query could be any SELECT query, executed once for row using the
values from the input data file. For example:

COPY footable (strcol, strcollen, moredata) FROM file USING SELECT $1,
length($1), $2;

The sql expressions could refer to the columns being read or could be user
defined procedures, built-in functions etc too. These expressions would need
to be executed per row read from the input data file to form a new set of
values[], nulls[] entries before forming the corresponding tuple entry.

I think the above will be a very useful enhancement to COPY. The syntax and
other details mentioned above are ofcourse subject to discussion and
approval on the list.

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] [BUGS] BUG #3774: create table like including index doesn't update pg_constraints with primary key

2007-12-02 Thread NikhilS
Hi,


  The fundamental question though is should we allow primary, unique
  CONSTRAINTS which use the index mechanism just as an implementation to
 be
  created using the INCLUDING INDEXES mechanism.

 Yeah, this bizarreness was foreseen and agreed to back when we set up
 LIKE INCLUDING CONSTRAINTS the way it was defined (ie, copying only
 CHECK constraints and not other things called constraints).  I was never
 very thrilled with that definition myself, but it's a bit too late to
 revisit it.


Yeah this is all confusing. I believe we should remove the following TODO
now that the above has been checked in:

 CREATE

   - Have WITH CONSTRAINTS also create constraint indexes

   http://archives.postgresql.org/pgsql-patches/2007-04/msg00149.php


Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [pgsql-www] [HACKERS] Time to update list of contributors

2007-11-29 Thread NikhilS
Apologies, if some of you receive duplicates of this email. I am not
subscribed to -www, so sending this to - hackers again.

 Hi,
  
   On Wed, 2007-11-28 at 11:44 +0530, Pavan Deolasee wrote:
  
   Nikhil S
  
   Nikhil is from India, EnterpriseDB.
  
   What is his surname? I think we need that for adding to web page.
 
  Yes, agreed. We had someone else who wanted to be listed by alias some
  time back (year+, don't remember whom it was) and that was turned down.
 

 Agreed :), for the record, my name is Nikhil Sontakke.

 Regard,
 Nikhils
 --

 EnterpriseDB   http://www.enterprisedb.com



Re: [HACKERS] [BUGS] BUG #3774: create table like including index doesn't update pg_constraints with primary key

2007-11-29 Thread NikhilS
Hi,



 The following bug has been logged online:

 Bug reference:  3774
 Logged by:  guillaume (ioguix) de Rorthais
 Email address:  [EMAIL PROTECTED]
 PostgreSQL version: 8.3 beta3
 Operating system:   mac os x 10.4.10
 Description:create table like including index doesn't update
 pg_constraints with primary key
 Details:

 When creating a table using the create table ... (like ... inluding
 indexes...) syntaxe, pg_catalog.pg_constraint is not updated with the PK
 constraints which actually is setted in pg_index.

 I'm not sure if this issue is actually a bug or if there a logic behind
 this, but as the primary key is a constraint, I would expect it to be
 setted
 in pg_constraint, shouldn't it ?


This can be handled by setting index-isconstraint appropriately inside
generateClonedIndexStmt().

The fundamental question though is should we allow primary, unique
CONSTRAINTS which use the index mechanism just as an implementation to be
created using the INCLUDING INDEXES mechanism.

As per the discussion here:

http://www.nabble.com/Re%3A-CREATE-TABLE-LIKE-INCLUDING-INDEXES-support-p10683716.html

maybe we should not?

In other words INCLUDING INDEXES should only create those indexes which do
not have isconstraint set to TRUE.

Comments?

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] Assertion failure due to ColumnRefStar

2007-09-27 Thread NikhilS
Hi,

One of our qmg folks reported an assertion failure:

create table x(y char(1));
insert into x  values (*);

The above causes the following assertion to be hit:

/*
 * Target item is a bare '*', expand all tables
 *
 * (e.g., SELECT * FROM emp, dept)
 *
 * Since the grammar only accepts bare '*' at top level of SELECT,
we
 * need not handle the targetlist==false case here.
 */
Assert(targetlist);

in ExpandColumnRefStar() function.

I was wondering if we should fix this by checking for
list_length(cref-fields) being greater than 1 before calling this in
transformExpressionList?

Regards,
Nikhils
-- 
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] StringInfo misc. issues

2007-08-30 Thread NikhilS
Apologies! As Alvaro guessed it correctly I was working with 8.2 sources.
Sorry for the noise.

Regards,
Nikhils

On 8/29/07, Tom Lane [EMAIL PROTECTED] wrote:

 NikhilS [EMAIL PROTECTED] writes:
  The attached patch should fix this.

 And break other things, no doubt.  needed = 0 is a perfectly valid
 edge case and mustn't be rejected here.  (In fact, I doubt you'd
 even get through the regression tests with this patch ... how much
 did you test it?)

 The real problem with what you describe is that you should have used
 makeStringInfo().

  I also found the absence of a function like resetStringInfo() a bit
  puzzling.

 CVS HEAD is way ahead of you.

 regards, tom lane




-- 
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] StringInfo misc. issues

2007-08-29 Thread NikhilS
Hi,

I palloc0'ed a variable of type StringInfo and without doing an
initStringInfo() (forgot to do it i.e.) tried to append some stuff to it
using appendStringInfo(). It went into a tight loop within the function
enlargeStringInfo() at:

while (needed  newlen)

Must be a common enough case for a palloc0'ed field right?

The attached patch should fix this.

*** 226,232 
!   if (needed  0 ||
((Size) needed) = (MaxAllocSize - (Size) str-len))
elog(ERROR, invalid string enlargement request size %d,
 needed);
--- 226,232 
!   if (needed = 0 ||
((Size) needed) = (MaxAllocSize - (Size) str-len))
elog(ERROR, invalid string enlargement request size %d,
 needed);

I also found the absence of a function like resetStringInfo() a bit
puzzling. A found a lot of places where the code was resetting the len
field to 0 and assigning '\0' to the data field to reset the variable. This
seems to be the only missing API which will be needed while working with the
StringInfo type.

Regards,
Nikhils

-- 
EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/lib/stringinfo.c
===
RCS file: /repositories/edbhome/cvs/EDBAS82/edb/edb-postgres/src/backend/lib/stringinfo.c,v
retrieving revision 1.3
diff -c -r1.3 stringinfo.c
*** src/backend/lib/stringinfo.c	9 Nov 2006 11:09:09 -	1.3
--- src/backend/lib/stringinfo.c	29 Aug 2007 14:37:58 -
***
*** 226,232 
  	 * bogus data.	Without this, we can get an overflow or infinite loop in
  	 * the following.
  	 */
! 	if (needed  0 ||
  		((Size) needed) = (MaxAllocSize - (Size) str-len))
  		elog(ERROR, invalid string enlargement request size %d,
  			 needed);
--- 226,232 
  	 * bogus data.	Without this, we can get an overflow or infinite loop in
  	 * the following.
  	 */
! 	if (needed = 0 ||
  		((Size) needed) = (MaxAllocSize - (Size) str-len))
  		elog(ERROR, invalid string enlargement request size %d,
  			 needed);
***
*** 259,261 
--- 259,272 
  
  	str-maxlen = newlen;
  }
+ 
+ /*
+  * resetStringInfo
+  * Reset the len field and the data field contents for a fresh start
+  */
+ void 
+ resetStringInfo(StringInfo str)
+ {
+ 	str-len = 0;
+ 	str-data[0] = '\0';
+ }
Index: src/include/lib/stringinfo.h
===
RCS file: /repositories/edbhome/cvs/EDBAS82/edb/edb-postgres/src/include/lib/stringinfo.h,v
retrieving revision 1.3
diff -c -r1.3 stringinfo.h
*** src/include/lib/stringinfo.h	9 Nov 2006 11:09:17 -	1.3
--- src/include/lib/stringinfo.h	29 Aug 2007 14:37:58 -
***
*** 138,141 
--- 138,146 
   */
  extern void enlargeStringInfo(StringInfo str, int needed);
  
+ /*
+  * resetStringInfo
+  * Reset the len field and the data field contents for a fresh start
+  */
+ extern void resetStringInfo(StringInfo str);
  #endif   /* STRINGINFO_H */

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Memory leak in vac_update_relstats ?

2007-07-20 Thread NikhilS

Hi,



It's palloc'd in the current memory context, so it's not serious. It'll
be freed at the end of the transaction, if not before that. That's the
beauty of memory contexts; no need to worry about small allocations like
that.



That's the beauty of memory contexts for small allocations. But because of
the 'convenience' of memory contexts we sometimes tend to not pay attention
to doing explicit pfrees. As a general rule I think allocations in
TopMemoryContext should be critically examined. I was bitten by this undue
bloat recently while developing some code and valgrind is not of much help
in such cases because of this very beauty of memory contexts :).

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Memory leak in vac_update_relstats ?

2007-07-20 Thread NikhilS

Hi,

That's the beauty of memory contexts for small allocations. But because of

the 'convenience' of memory contexts we sometimes tend to not pay attention
to doing explicit pfrees. As a general rule I think allocations in
TopMemoryContext should be critically examined. I was bitten by this undue
bloat recently while developing some code and valgrind is not of much help
in such cases because of this very beauty of memory contexts :).




One specific case I want to mention here is hash_create(). For local hash
tables if HASH_CONTEXT is not specified, they get created in a context which
becomes a direct child of TopMemoryContext. Wouldn't it be a better idea to
create the table in CurrentMemoryContext?

If hash_destroy() is not explicitly invoked, this can cause a lot of bloat
especially if the intention was to use the hash table only for a while.

Regards,
Nikhils

Regards,
Nikhils


--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] schema creation during initdb

2007-04-18 Thread NikhilS

Hi,

On 4/18/07, sharath kumar [EMAIL PROTECTED] wrote:


Sorry i have put the question wrongly. I wanted to ask if we can create a
schema during createdb time so that i have to hide the following from the
user.
 psql -c 'create schema foo' mytemplate
 psql -c 'create table foo.bar ...' mytemplate
Whenever a user runs createdb command, the above schema and table should
be created automatically.
Thanks,
Sharat.



One way that I can think of doing this would be by adding the details about
the new schema and the tables that lie therein in the
src/backend/catalog/information_schema.sql file (I dont know if this is the
recommended way though). These will end up becoming a part of template1 and
any subsequent databases that are created will contain them.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] where to write small reusable functions ?

2007-04-13 Thread NikhilS

Hi,

char *

pg_strcat (char *dest,char *src)
{
/* pg_realloc is a safer function than realloc */
 dest=pg_realloc(dest,strlen(dest)+strlen(src)+1);

strcat(dest,src);
return dest;
}



Postgres already has something for the above functionality.

See makeStringInfo, appendStringInfo.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] Bug about column references within subqueries used in selects

2007-04-12 Thread NikhilS

Hi,

Shouldn't the final command below cause a 'column b does not exist error'?

create table update_test (a int, b int);
create table supdate_test(x int, y int);
insert into update_test values (20, 30);
insert into supdate_test values (40, 50);
select a, (select b from supdate_test) from update_test;

a  ?column?
-- -
   2030

Is the problem with the code in colNameToVar or maybe we should add checks
in transformSubLink?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Bug about column references within subqueries used in selects

2007-04-12 Thread NikhilS

Hi,

On 4/12/07, Merlin Moncure [EMAIL PROTECTED] wrote:


On 4/12/07, NikhilS [EMAIL PROTECTED] wrote:
 Hi,

 Shouldn't the final command below cause a 'column b does not exist
error'?

 create table update_test (a int, b int);
 create table supdate_test(x int, y int);
 insert into update_test values (20, 30);
 insert into supdate_test values (40, 50);
 select a, (select b from supdate_test) from update_test;

  a  ?column?
 -- -
 2030

 Is the problem with the code in colNameToVar or maybe we should add
checks
 in transformSubLink?

I don't think so...the columns of update_test are visible to the
scalar subquery...that way you can use fields from 'a' to filter the
subquery...
select a, (select y from supdate_test where x = a) from update_test;



Yes this is fine, but in select columnname from tablename using column
references of the other involved table is what I am objecting to.

Regards,
Nikhils


--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-04-11 Thread NikhilS

Hi,


Hmm.  That sounds like it would be a horrid mess.  You need to decouple
 the execution of the subplan from the use of its outputs, apparently.
 There is some precedent for this in the way that InitPlans are handled:
 the result of the subplan is stored into a ParamList array entry that's
 later referenced by a Param node in the parent's expression tree.  That
 would generalize easily enough to setting more than one Param, but I'm
 not clear on where you'd want to stick the subplan itself in the plan
 tree, nor on what controls how often it needs to get evaluated.






Ended up using something similar to the above suggestion. I have posted the
patch to -patches based on this.

An important concern was where to stick the evaluation of the subqueries so
that they end up becoming subplans which are used in the execution. For this
I have added a new field in the Query structure. This entry gets
preprocessed similar to other fields of the Query from within
subquery_planner.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Idle idea for a feature

2007-04-10 Thread NikhilS

Hi,

On 4/11/07, Guillaume Smet [EMAIL PROTECTED] wrote:


On 4/10/07, Tom Lane [EMAIL PROTECTED] wrote:
 ISTM it'd be a good idea if it did, as are there any incoming
 foreign keys seems to be a question we constantly ask when solving
 update-performance problems, and there isn't any easy way to check for
 such.

On similar lines, maybe \d can also show the list of inheritors when

invoked on a parent.
e.g:
postgres=# \d parent
   Table public.parent
Column |  Type   | Modifiers
+-+---
a  | integer |
Indexes:
   parent_a_key UNIQUE, btree (a)
Inherited by: child

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto Partitioning

2007-04-05 Thread NikhilS

Hi,


 The only problem I have with this is that the shops I know with big
 partitioned tables favor triggers over rules for both performance reason
and
 a cleaner implementation.  Even with automated rule creation this isnt
going
 to change afaics... not to mention we already create our rules 
triggers
 automatically, so really this just isn't exciting to me (though it may
make
 it easier for people getting in on the ground floor)

I second this. The trigger route is much more maintainable than the rule
route. IMO what really needs to happen is something more low level where
there are no DBA visible changes. Triggers also have overhead, it would
be nice to get a little more bare metal with this.



I had raised this issue about rules/triggers back then and the responses
seemed to be evenly split as to which ones to use.

I think the broad question really is how well we want to support the current
inheritance based partitioning mechanism. If we want to stick to it for a
while (and to which we will stick to unless something concrete/better/bare
metal comes up), IMHO we should try to make things simpler (by automating
things if possible) to make it easier for people getting in.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto Partitioning

2007-04-05 Thread NikhilS

Hi,



 I had raised this issue about rules/triggers back then and the
 responses seemed to be evenly split as to which ones to use.

Presumably your implementation already uses Triggers for INSERTs though,
so why not use triggers for everything?



No I am using rules for all the 3 cases. I am done with the UPDATE stuff too
on which I was stuck with some help, so here is what the patch will do:

postgres=# create table test1 (a int unique , b int check (b  0)) partition
by range(a) (partition child_1 check (a  10));
NOTICE:  CREATE TABLE / UNIQUE will create implicit index test1_a_key for
table test1
NOTICE:  CREATE TABLE / UNIQUE will create implicit index child_1_a_key
for table child_1
CREATE TABLE

A describe of the parent shows the rules added to it:
postgres=# \d test1
Table public.test1
Column |  Type   | Modifiers
+-+---
a  | integer |
b  | integer |
Indexes:
   test1_a_key UNIQUE, btree (a)
Check constraints:
   test1_b_check CHECK (b  0)
Rules:
   test1_child_1_delete AS
   ON DELETE TO test1
  WHERE old.a  10 DO INSTEAD  DELETE FROM child_1
 WHERE child_1.a = old.a
   test1_child_1_insert AS
   ON INSERT TO test1
  WHERE new.a  10 DO INSTEAD  INSERT INTO child_1 (a, b)
 VALUES (new.a, new.b)
   test1_child_1_update AS
   ON UPDATE TO test1
  WHERE old.a  10 DO INSTEAD  UPDATE child_1 SET a = new.a, b = new.b
 WHERE child_1.a = old.a

Whereas a describe on the child shows the following:

postgres=# \d child_1
   Table public.child_1
Column |  Type   | Modifiers
+-+---
a  | integer |
b  | integer |
Indexes:
   child_1_a_key UNIQUE, btree (a)
Check constraints:
   child_1_a_check CHECK (a  10)
   test1_b_check CHECK (b  0)
Inherits: test1

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto Partitioning

2007-04-05 Thread NikhilS



So we are unable to load any of the tables using COPY.



Aww, guess should have stuck to triggers as a first choice. Mea culpa, since
I should have investigated some more before deciding on rules, or should
have prodded you more earlier:)

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto Partitioning

2007-04-04 Thread NikhilS

Hi,



I appreciate you efforts, but I'm not sure if this has been discussed



Thanks Markus.

enough. There seem to be two ideas floating around:


  - you are heading for automating the current kludge, which involves
creating partitions and constraints by hand. AFAICT, you want to
support list and range partitioning.

  - Simon Riggs has proposed partitioning functions, which could easily
handle any type of partitioning (hash, list, range and any mix of
those).



When I submitted  the proposal, AFAIR there was no objection to going with
the first proposal. Yes there was a lot of forward looking discussion, but
since what I had proposed (atleast syntax wise) was similar/closer to Mysql,
Oracle I did not see any one objecting to it. I think SQL server provides
partitioning functions similar to Simon's proposal. And all along, I had
maintained that I wanted to automate as far as possible, the existing
mechanism for partitioning. To this too, I do not remember anyone objecting
to.

Our current partitioning solution is based on inheritance. With that in
mind, for 8.3 I thought an implementation based on auto rules creation would
be the way to go.

Having said that, obviously I would want to go with the consensus on this
list as to what we think is the *best* way to go forward with partitioning.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto Partitioning Patch - WIP version 1

2007-04-03 Thread NikhilS

Hi,



The following things are TODOs:

iv) Auto generate rules using the checks mentioned for the partitions, to
handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child.
Note that checks specified directly on the master table will get inherited
automatically.



Am planning to do the above by using the check constraint specified for each
partition. This constraint's raw_expr field ends up becoming the whereClause
for the rule specific to that partition.

One question is whether we should we allow auto creation of UPDATE rules
given that updates can end up spanning multiple partitions if the column on
which partitioning is specified gets updated?

Also if we decide to auto - add rules for UPDATE, the raw_expr will need to
be modified to refer to OLD.col, which can be quite a headache. We do not
have parsetree walker/mutator functions as far as I could see in the code.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-30 Thread NikhilS

Hi,

No.  Offhand I think you'd either need to relax EXPR_SUBLINK to allow

 multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
 just like EXPR_SUBLINK except for allowing multiple output columns.
 The latter would probably be less likely to break other things...

Yeah, was looking at EXPR_SUBLINK and its single column use case and drove
to the same conclusion that inventing a new sublink type would be better
too. It is indeed becoming a not so simple and narrow fix as you had
mentioned earlier in your first response :)




I have invented a ROWEXPR_SUBLINK type that handles multiple output columns.
The trouble is that since eventually columns of the parents have to be part
of the query's targetList, I am sending the entire subquery as one of the
entries in that list and the targetList gets populated with entries
dependent on the subquery much later via make_subplan.

This breaks code in rewriteTargetList (which expects every list entry to be
of type TargetEntry), and expand_targetlist (which expects the targets to be
present in attrno order, the entries added because of the subquery will not
be in order as compared to normal SET colname = expr targets).

Is there a simpler way of doing things? Should I try generating a resjunk
TargetEntry in transformUpdateStmt and have its expr point to the subquery
and see if that works?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-30 Thread NikhilS

Hi,

On 3/31/07, Tom Lane [EMAIL PROTECTED] wrote:


NikhilS [EMAIL PROTECTED] writes:
 I have invented a ROWEXPR_SUBLINK type that handles multiple output
columns.
 The trouble is that since eventually columns of the parents have to be
part
 of the query's targetList, I am sending the entire subquery as one of
the
 entries in that list and the targetList gets populated with entries
 dependent on the subquery much later via make_subplan.

 This breaks code in rewriteTargetList (which expects every list entry to
be
 of type TargetEntry), and expand_targetlist (which expects the targets
to be
 present in attrno order, the entries added because of the subquery will
not
 be in order as compared to normal SET colname = expr targets).

Hmm.  That sounds like it would be a horrid mess.  You need to decouple
the execution of the subplan from the use of its outputs, apparently.
There is some precedent for this in the way that InitPlans are handled:
the result of the subplan is stored into a ParamList array entry that's
later referenced by a Param node in the parent's expression tree.  That
would generalize easily enough to setting more than one Param, but I'm
not clear on where you'd want to stick the subplan itself in the plan
tree, nor on what controls how often it needs to get evaluated.




Yes, I have tried this already. As you suspect, it seems that the subplan
does not get evaluated if its not part of the targetList at all.

An alternative approach is to put the subplan into the rangetable and

use Vars to reference its outputs.  Again it's not quite clear what
drives re-execution of the subplan.  It strikes me though that an
approach like this might also serve for SQL2003's LATERAL construct,
which'd be a nice thing to support.



Ok, I will try this out.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-16 Thread NikhilS

Hi,



 What's the expected result if the tuple from subselect is more than 1?

Error, per SQL99 section 7.14:

 1) If the cardinality of a row subquery is greater than 1
(one),
then an exception condition is raised: cardinality violation.



I expect no update at all in case of void result set, is this the case ?

No, you get nulls; it's a subquery not a join.  Per SQL99 7.1:

c) If the row value constructor is a row subquery, then:

  i) Let R be the result of the row subquery and let D be
the
 degree of R.

 ii) If the cardinality of R is 0 (zero), then the result of
the
 row value constructor is D null values.

iii) If the cardinality of R is 1 (one), then the result of
the
 row value constructor is R.

regards, tom lane



To allow both of the above to hold, I think the subselect will have to be
treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar
mechanism for plain selects/subselects to check and restrict their output to
a single row.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-16 Thread NikhilS

Hi,

On 3/16/07, Tom Lane [EMAIL PROTECTED] wrote:


NikhilS [EMAIL PROTECTED] writes:
 To allow both of the above to hold, I think the subselect will have to
be
 treated like a EXPR_SUBLINK subquery. I was wondering if we have a
similar
 mechanism for plain selects/subselects to check and restrict their
output to
 a single row.

No.  Offhand I think you'd either need to relax EXPR_SUBLINK to allow
multiple output columns, or invent a ROW_SUBLINK SubLinkType that is
just like EXPR_SUBLINK except for allowing multiple output columns.
The latter would probably be less likely to break other things...



Yeah, was looking at EXPR_SUBLINK and its single column use case and drove
to the same conclusion that inventing a new sublink type would be better
too. It is indeed becoming a not so simple and narrow fix as you had
mentioned earlier in your first response :)

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] where to add/change commands

2007-03-15 Thread NikhilS

Hi,



psql? There's some code to detect commands that can't be run in a
transaction block in src/bin/psql/common.c, maybe that's what you're
looking for.

Or did you mean something else? How doesn't it pick it up?




I think he probably meant that he was getting a syntax error, even after
making all the changes.

Grzegorz, I would have suggested to make an entry for VERBOSE in
parser/keywords.c, but it already seems to contain an entry for VERBOSE. I
hope you are using the opt_verbose rule in your gram.y in the CLUSTER
[VERBOSE] case.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] UPDATE using sub selects

2007-03-14 Thread NikhilS

Hi,

I have coded up a patch which solves the following TODO. I will submit a
patch for this soon:


  - UPDATE
 - Allow UPDATE tab SET ROW (col, ...) = (SELECT...)


 http://archives.postgresql.org/pgsql-hackers/2006-07/msg01306.php


The question is that since this enhances the UPDATE syntax, what changes and
where all they need to be made with respect to the documentation?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] UPDATE using sub selects

2007-03-14 Thread NikhilS

Hi,




 The question is that since this enhances the UPDATE syntax, what changes
and
 where all they need to be made with respect to the documentation?

Documentation is the very least of your worries.  What exactly is your
implementation plan?  If this were a simple or narrow fix it would
have been done already.




The implementation that I have planned is pretty similar to the way INSERT
INTO ... SELECT has been implemented.

Along with the grammar changes in gram.y, the changes are localized in the
transformUpdateStmt code path. The SELECT clause ends up becoming a subquery
to the update query with the target column expressions transformed properly
to include the subquery expressions. Does this sound ok?

I have tried some update-subselect variations and they seem to work. For
example the case in the src/test/regress/sql/update.sql, which used to fail
till now, seems to work:

UPDATE update_test SET (a,b) = (select a,b FROM update_test where c = 'foo')
 WHERE a = 10;

Will try testing out some other variations too.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-09 Thread NikhilS

Hi,



Why would we support HASH partitions?
If you did, the full syntax for hash clusters should be supported.



In MySQL, Oracle, the syntax for HASH partitions seems to be similar to the
one mentioned. I do not know much about hash clusters though.

If we do the CHECK clauses like that then we still have don't have a

guaranteed non-overlap between partitions. It would be easier to use
Oracle syntax and then construct the CHECK clauses from that.



Again Oracle, MySQL use VALUES LESS THAN  (expr) format for RANGE
partitions. So you mean that they end up creating ranges like MININT -
Range1, Range1+1 - Range2 etc for each of the partitions?

I think Postgres users are used to the CHECK clauses and I still feel that
the onus of distinct partitions lies on the partition creator.

Also, the syntax needs to be fairly complex to allow for a mixture of

modes, e.g. range and list partitioning. That is currently possible
today and the syntax for doing that is IMHO much simpler than the Oracle
simple way of specifying it.



Subpartitioning is not being targeted right now, but could be put on the
TODO list for further enhancements.

An alternative is to provide a partitioning function which decides which

partition each values goes into.

PARTITION FUNCTION which_partition(date_col)

The partition function must return an unsigned integer  0, which would
correspond to particular partitions. Partitions would be numbered 1..N,
and named tablename_partM where 1 = M = N.

The input and contents of the partition function would be up to the
user. e.g.

CREATE FUNCTION range_partition(date date_col)
{
if (date_col  D1)
return 1;
else if (date_col  D2)
return 2;
else if (date_col  D3)
return 3;

return 4;
}

Doing it this way would allow us to easily join two tables based upon a
common partition function.

In time, I would suggest we support both ways: declarative and
functional.



Till now, we are going the declarative way.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-09 Thread NikhilS

Hi,

On 3/10/07, Hannu Krosing [EMAIL PROTECTED] wrote:


Ühel kenal päeval, R, 2007-03-09 kell 15:41, kirjutas Jim Nasby:
 On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote:
  Since partition is inheritance-based, a simple DROP or  NO
  INHERIT
  will do the job to deal with the partition. Do we want to reinvent
  additional syntax when these are around and are documented?
 
  Well, if the syntax for adding a new partition eventually
  ends up as ALTER TABLE ADD PARTITION, then it would make more
  sense that you remove a partition via ALTER TABLE DROP PARTITION.
 
  But DROP PARTITION usually moves the data from this partition to other
  partitions,
  so it is something different.

 It does? IIRC every partitioning system I've seen DROP PARTITION
 drops the data as well. It's up to you to move it somewhere else if
 you want to keep it.

Will this proposed DROP PARTITION just disassociate the table from the
master, or will it actually drop the partitions table from the whole
database ?



Thats why I would prefer the existing mechanism, there a DROP on the child
removes it and a NO INHERIT disassociates it. There might be situations
where we would want to just disassociate and not drop.

Regards,
Nikhils

--


Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com






--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-09 Thread NikhilS

Hi,


 Given that Simon wants to do away with having the master table APPENDed
in
 the planning phase, this would be better.


ISTM you're trading appending the master table for appending the DUMP
partition, which afaict would give you no gain.




If there are entries in the master table, I think it would get appended for
all queries regardless of whether we need to examine its contents or not.
Segregating dump data into a partition will avoid that.

I have seen examples in some other databases wherein a partition specifies a
range of someval - MAXINT for instance, to catch such cases.

That again means that the onus is on the partition creator most of the
times..
Regards,
Nikhils




--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread NikhilS

Hi,

Maybe, the dummy entry could be extended to contain the bounds (max/min) for
each of the other involved partitions and they could be updated each time a
DML happens across the partitions. That ways, an update to a particular
partition needs to lock out the others, examine the dummy entries in its own
index and follow it up with dummy entries update into other partitions if
the need be.

Ofcourse as you have mentioned all of this so needs to be done after a
careful think on the locking/deadlocking etc issues.

Regards,
Nikhils

On 3/7/07, Alvaro Herrera [EMAIL PROTECTED] wrote:


I am wondering if we can implement unique indexes across several tables
(inheritance hierarchy) not by using a single, big index covering all
the tables, but rather by inserting a dummy entry into each partition's
unique index.  This dummy entry would have an expanded CTID which would
include the tableoid, so it's possible to check it (albeit there is a
problem in that we may require the opening of another heap to do the
actual checking).  These dummy entries could be removed by bulkcleanup
as soon as the inserting transaction is no longer running, to avoid
bloating the index too much.  All said dummy index entries would be
located at either the rightmost or the leftmost leaf, or close to it, so
another idea is to have future inserters reuse the entry for a different
key.

The obvious problem with this is, naturally, the excess I/O that extra
index traversing causes.  The not so obvious ones are locking,
deadlocking and the opening of other heaps and indexes while you do the
insertion, which may be too expensive.  On the other hand, maybe this
idea is easier to implement than full-fledged cross-table indexes, so we
could have richer partitioning earlier than when somebody finally bites
the bullet and implements cross-table indexes.

Or maybe this is just a dumb idea, but I had to let it out anyway :-)

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread NikhilS

Hi,



There are 2 other reasons to favor triggers though:

1) People (Josh Drake comes to mind) have found that if you get over
a tiny number of partitions, the performance of rules is abysmal.

2) I believe it should be possible to construct an update trigger
that allows you to perform updates that will place the row in
question into a new partition. While I can see cases for simply
disallowing updates to the partitioning key, I think there are also
times when being able to do that would be very useful.



The consensus seems to be veering towards triggers.



I think it'd be great to make adding and removing partitions as
simple as ALTER TABLE. I don't think that DELETE should be the
mechanism to drop a partition, though. Again, DML statements
shouldn't be performing DDL.



Since partition is inheritance-based, a simple DROP or  NO INHERIT will do
the job to deal with the partition. Do we want to reinvent additional syntax
when these are around and are documented?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread NikhilS

Hi,

On 3/9/07, Shane Ambler [EMAIL PROTECTED] wrote:



 Note to Nikhil: Make sure the new syntax doesn't prevent partitions from
 being placed upon multiple tablespaces in some manner, at CREATE TABLE
 time.

What if the syntax was something like -

CREATE TABLE tabname (
 ...
 ...
  ) PARTITION BY
  HASH(expr)
| RANGE(expr)
| LIST(expr)
[PARTITIONS num_partitions] /* will apply to HASH only for now*/
[PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname],
  PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]
  ...
];


And (if we use the ALTER TABLE to add partitions)

ALTER TABLE tabname
ADD PARTITION partition_name CHECK(...)
[USING TABLESPACE tblspcname];



We could as well drop the USING part.

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-08 Thread NikhilS

Hi,



This follows on from the suggestion I made - taken along the lines of
the subject auto creation of partitions where I suggested the syntax
of partition check(month of mydatecol) and have a new partition created
as data was entered. With this scenario dropping the partition when it
was empty would complement the creation of a new partition as needed.

Given that there seems to be no real support of going with auto
maintenance were new partitions are added as needed, then the auto
dropping of empty partitions would also not apply.

Leaving us with only specific add partition / drop partition commands.
And have the parent table pick up rows not matching any partition check
criteria.



I was thinking along the lines of what Jim had suggested earlier regarding
overflow partition. Instead of dumping unmatched rows to the master table,
we could put them into a default DUMP/DUMB partition.

Given that Simon wants to do away with having the master table APPENDed in
the planning phase, this would be better.

Regards,
Nikhils


--


Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread NikhilS

Hi,

If you know that the constraints on each of the tables is distinct, then

building a UNIQUE index on each of the partitions is sufficient to prove
that all rows in the combined partitioned table are distinct also.

The hard part there is checking that the partition constraints are
distinct. If the partition constraints are added one at a time, you can
use the predicate testing logic to compare the to-be-added partition's
constraint against each of the already added constraints. That becomes
an O(N) problem.

What is really needed is a data structure that allows range partitions
to be accessed more efficiently. This could make adding partitions and
deciding in which partition a specific value goes an O(logN) operation.



If the unique constraint is supposed to be on a column which is NOT being
used for the partitioning, then all the above becomes much more difficult.
While partitioning, the additional onus on the user is to specify
non-conflicting CHECKs for the range/list partitions.

Regards,
Nikhils

--

  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com






--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread NikhilS

Hi,

On 3/7/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote:



  iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,
  pass it on to the children tables.

 How will you maintain a primary key in such a table,
 considering that indexes can't span multiple tables?

Many partitioning schemes have (or can be made to have) only one
possible target partition for the primary key. Thus if you create
separate unique indexes on each partition the problem is solved.
For a first version I opt, that it is sufficient to disallow creation of
a unique index on the master, when the constraints (on columns of this
index) do not nail down a specific partition for each row (e.g. a hash
or a range on one of the index columns that does not overlap).
Later, global index, or indexes with separate partitioning rules can be
implemented, that cover the other cases.

Andreas



Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the
child table, only if the indexed column is present as part of the
partitioning rule.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

Hi,

This is to get feedback to meet the following TODO:

  - Simplify ability to create partitioned tables

  This would allow creation of partitioned tables without requiring
  creation of rules for INSERT/UPDATE/DELETE, and constraints for rapid
  partition selection. Options could include range and hash partition
  selection.


There was some discussion on the pgsql mailing lists, which lead to the
above TODO:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg00189.php
http://archives.postgresql.org/pgsql-hackers/2006-08/msg01874.php

We can have the following syntax to support auto creation of partitions in
Postgresql:

CREATE TABLE tabname (
...
...
 ) PARTITION BY
 HASH(expr)
| RANGE(expr)
| LIST(expr)
[PARTITIONS num_partitions] /* will apply to HASH only for now*/
[PARTITION partition_name CHECK(...),
 PARTITION partition_name CHECK(...)
 ...
];

Here expr will be one of the column names as specified for the master
table. Once we finalize the syntax, the above statement would end up
carrying out the following activities (disclaimer: we might add or remove
some activities based on the discussion here).

i ) Create master table.
ii) Create children tables based on the number of partitions specified and
make them inherit from the master table.
iii) Auto generate rules (or triggers?) using the checks mentioned for the
partitions, to handle INSERTs/DELETEs/UPDATEs to navigate them to the
appropriate child. Note that checks specified directly on the master table
will get inherited automatically.
iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it
on to the children tables.
v) If possible add CHECK (false) to the master table to avoid any activity
on it.

Some questions remain as to:

1) Whether we should use triggers/rules for step number (iii) above. Maybe
rules is the way to go.
2) What other attributes (access permissions e.g.) of the master along with
the ones specified in (iv) should be passed on to the children.
3) Some implementation specific issue e.g. whether SPI_execute would be a
good way of creating these rules.

Comments appreciated,
Regards,
Nikhils
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

Hi,

On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote:


NikhilS wrote:
 iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,
 pass it on to the children tables.

How will you maintain a primary key in such a table, considering that
indexes can't span multiple tables?



We will not (I know its a hard thing to do :) ), the intention is to use
this information from the parent and make it a property of the child table.
This will avoid the step for the user having to manually specify CREATE
INDEX and the likes on all the children tables one-by-one.


1) Whether we should use triggers/rules for step number (iii) above.
 Maybe rules is the way to go.

Since this would basically be a case of the updatable rules problem, you
should review those discussions in the past to check whether the issues
mentioned there don't interfere with that plan.



The rules mentioned here will be to specify that all the
inserts/updates/deletes should go into proper children tables instead of the
parent. I do not see the updateable rules problem with regards to this, but
will check out the archives for discussion on this related to partitioning.


2) What other attributes (access permissions e.g.) of the master
 along with the ones specified in (iv) should be passed on to the
 children.

Moreover, how are later changes of those attributes propagated?



Once created, this will be a normal inheritance relationship between the
tables and all the existing commands will apply to both the parent and the
child.

The basic idea here is to automate as many things as possible at partition
creation time. The user is free to make additional changes to the involved
tables later too.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

On 3/6/07, NikhilS [EMAIL PROTECTED] wrote:


Hi,

On 3/6/07, Gregory Stark [EMAIL PROTECTED] wrote:

 NikhilS [EMAIL PROTECTED] writes:

 the intention is to use this information from the parent and make it a
 property of the child table. This will avoid the step for the user
 having to
 manually specify CREATE INDEX and the likes on all the children tables
 one-by-one.

 Missed the start of this thread. A while back I had intended to add WITH
 INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE
 parent
 WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready
 for
 adding to the inheritance structure.


Yeah, this one aims to do pretty much the above as part of the auto
creation of the inheritance-based partitions.



And to add, maybe if there is consensus/demand for the WITH INDEXES idea
mentioned above too, I could work on it as well.

Regards,
Nikhils

Regards,

Nikhils


--
   Gregory Stark
   EnterpriseDB   http://www.enterprisedb.com




--
EnterpriseDB   http://www.enterprisedb.com





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

Hi,

On 3/6/07, Gregory Stark [EMAIL PROTECTED] wrote:


NikhilS [EMAIL PROTECTED] writes:

the intention is to use this information from the parent and make it a
property of the child table. This will avoid the step for the user having
to
manually specify CREATE INDEX and the likes on all the children tables
one-by-one.

Missed the start of this thread. A while back I had intended to add WITH
INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent
WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready
for
adding to the inheritance structure.



Yeah, this one aims to do pretty much the above as part of the auto creation
of the inheritance-based partitions.

Regards,
Nikhils


--

  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

Hi Shane,

Maybe I'm looking at auto-maintenance which is beyond any current planning?


Many of your suggestions are useful, but auto-maintenance will be beyond the
current plan.

Regards,
Nikhils

EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Auto creation of Partitions

2007-03-06 Thread NikhilS

Hi,

On 3/7/07, Tom Lane [EMAIL PROTECTED] wrote:


Peter Eisentraut [EMAIL PROTECTED] writes:
 But when I say
 CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ...
 then I expect that the primary key will be enforced across all
 partitions.  We currently sidestep that issue by not offering seemingly
 transparent partitioning.  But if you are planning to offer that, the
 unique index issue needs to be solved, and I see nothing in your plan
 about that.

Agreed, it needs to Just Work.  I think it'd still be useful though
if we only support auto-partitioning on the primary key, and that
restriction avoids the indexing problem.

regards, tom lane



Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain
uniqueness on a
partition-by-partition basis too?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] PrivateRefCount (for 8.3)

2007-03-05 Thread NikhilS

Hi,

What is the opinion of the list as to the best way of measuring if the
following implementation is ok?

http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php

As mentioned in earlier mails, this will reduce the per-backend usage of
memory by an amount which will be a fraction (single digit percentage)
of (NBuffers
* int) size. I have done pgbench/dbt2 runs and I do not see any negative
impact because of this. Are there any other suggestions for measuring the
backend memory footprint?

Regards,
Nikhils

On 2/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:



Added to TODO:

* Consider decreasing the amount of memory used by PrivateRefCount

  http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
  http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php



---

Simon Riggs wrote:
 On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote:
  Simon Riggs wrote:
   int8 still seems like overkjll. When will the ref counts go above 2
on a
   regular basis? Surely refcount=2 is just chance at the best of
times.
  
   Refcount - 2 bits per value, plus a simple overflow list? That
would
   allow 0,1,2 ref counts plus 3 means look in hashtable to find real
   refcount.
 
  At two bits, would we run into contention for the byte by multiple
  backends?

 No contention, its a private per-backend data structure. That's why we
 want to reduce the size of it so badly.

 --
   Simon Riggs
   EnterpriseDB   http://www.enterprisedb.com



 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

--
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] PrivateRefCount (for 8.3)

2007-03-05 Thread NikhilS

Hi,

Yeah with 500K shared buffers and multiples of backends, we could achieve
noticeable savings with this. And that is why it will be difficult to show
the performance gains by running just pgbench/dbt2 on medium scale machines.

One way of looking at this could be that memory saved here, could lead to
more critical usage elsewhere... But agreed, it is hard to show with just
some performance runs.

Regards,
Nikhils

On 3/5/07, Stefan Kaltenbrunner [EMAIL PROTECTED] wrote:


Tom Lane wrote:
 NikhilS [EMAIL PROTECTED] writes:
 What is the opinion of the list as to the best way of measuring if the
 following implementation is ok?
 http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
 As mentioned in earlier mails, this will reduce the per-backend usage
of
 memory by an amount which will be a fraction (single digit percentage)
 of (NBuffers
 * int) size. I have done pgbench/dbt2 runs and I do not see any
negative
 impact because of this.

 I find it extremely telling that you don't claim to have seen any
 positive impact either.

 I think that the original argument
 http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
 is basically bogus.  At 50 buffers (4GB in shared memory) the
 per-backend space for PrivateRefCount is still only 2MB, which is
 simply not as significant as Simon claims; a backend needs at least
 that much for catalog caches etc.  There is, furthermore, no evidence
 that running shared_buffers that high is a good idea in the first
 place, or that there aren't other performance bottlenecks that will
 manifest before this one becomes interesting.

hmm - we are continuily running into people with dedicated servers that
have 16GB RAM or even more available and most tuning docs recommend some
20-30% of system RAM to get dedicated to shared_buffers. So having some
500k buffers allocated does not sound so unrealistic in practise and
combined with the fact that people often have a few hundred backends
that could add up to some noticable overhead.
If that is actually a problem given that those people tend to have heaps
of memory is another story but if we can preserve some memory ...


Stefan





--
EnterpriseDB   http://www.enterprisedb.com


[HACKERS] --enable-debug does not work with gcc

2007-02-02 Thread NikhilS

Hi,

configure with --enable-debug does not seem to add -g to CFLAGS while
compiling with gcc. Guess we will need to change configure.in as below:

***

 # supply -g if --enable-debug
! if test $enable_debug = yes  test $ac_cv_prog_cc_g = yes; then
   CFLAGS=$CFLAGS -g
 fi
--- 300,315 

 # supply -g if --enable-debug
! if test $enable_debug = yes  (test $ac_cv_prog_cc_g = yes ||
!   test $ac_cv_prog_gcc_g = yes); then
   CFLAGS=$CFLAGS -g
 fi


Should I submit a patch for this?

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] --enable-debug does not work with gcc

2007-02-02 Thread NikhilS

Hi,

Indeed it does, apologies for not doing the entire groundwork. But what it
also does is that it adds -O2 by default for gcc even when --enable-debug is
specified. gdb is not able to navigate the stack traces properly with this
optimization in place. Especially tracing of static functions becomes
difficult. Has this issue been faced by anybody else? If so can try out a
patch to avoid using O2 with enable-debug.

Regards,
Nikhils

On 2/2/07, Gavin Sherry [EMAIL PROTECTED] wrote:


On Fri, 2 Feb 2007, NikhilS wrote:

 Hi,

 configure with --enable-debug does not seem to add -g to CFLAGS while
 compiling with gcc. Guess we will need to change configure.in as below:

Erm... works for me and everyone else... AFAIK.

Thanks,

Gavin





--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] --enable-debug does not work with gcc

2007-02-02 Thread NikhilS

Hi,

On 2/2/07, Gavin Sherry [EMAIL PROTECTED] wrote:


On Fri, 2 Feb 2007, NikhilS wrote:

 Hi,

 Indeed it does, apologies for not doing the entire groundwork. But what
it
 also does is that it adds -O2 by default for gcc even when
--enable-debug is
 specified. gdb is not able to navigate the stack traces properly with
this
 optimization in place. Especially tracing of static functions becomes
 difficult. Has this issue been faced by anybody else? If so can try out
a
 patch to avoid using O2 with enable-debug.

Yes, this is known. The thing with gcc is, it only emits some warnings at
-O2. I'm not that this is why we do not set optimisation to 0 but have
long assumed it to be the case. I imagine that it's fairly standard
practice for people doing debugging to CFLAGS=-O0 as an argument to
configure.

True, this is how I myself circumvent this problem too. But IMHO,

explicitly passing CFLAGS when we are invoking --enable-debug (which does
add -g, but leaves some optimization flag around which deters debugging)
does not seem correct?

Regards,
Nikhils


--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] PrivateRefCount (for 8.3)

2007-01-16 Thread NikhilS

Hi,

Most likely a waste of development effort --- have you got any evidence

of a real effect here?  With 200 max_connections the size of the arrays
is still less than 10% of the space occupied by the buffers themselves,
ergo there isn't going to be all that much cache-thrashing compared to
what happens in the buffers themselves.  You're going to be hard pressed
to buy back the overhead of the hashing.

It might be interesting to see whether we could shrink the refcount
entries to int16 or int8.  We'd need some scheme to deal with overflow,
but given that the counts are now backed by ResourceOwner entries, maybe
extra state could be kept in those entries to handle it.



I did some instrumentation coupled with pgbench/dbt2/views/join query runs
to find out the following:

(a) Maximum number of buffers pinned simultaneously by a backend:  6-9

(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6

(a) indicates that for large shared_buffers value we will end up with space
wastage due to a big PrivateRefCount array per backend (current allocation
is (int32 * shared_buffers)).

(b) indicates that the refcount to be tracked per buffer is a small enough
value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.

Following is the Hash Table based proposal based on the above readings:

- Do away with allocating NBuffers sized PrivateRefCount array which is
 an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be
multiples
 ahead of the above observed ranges. Define Overflow_Size to be 8 or some
similar small value to handle collisions.

- Define the following Hash Table entry to keep track of reference counts

   struct HashRefCntEnt
   {
   int32BufferId;
   int32RefCnt;
   int32NextEnt;/* To handle collisions */
   };

-  Define a similar Overflow Table entry as above to handle collisions.

An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size
will get
initialized in the InitBufferPoolAccess function.

An OverflowTable of size Overflow_Size will be allocated. This array will be
sized dynamically (2* current Overflow_Size) to accomodate more entries if
it cannot accomodate further collisions in the main table.

We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the
buffer
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which
should be a cheap enough operation.

Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.

Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc.
will be modified to consider the above mechanism properly. The changes will
be localized in the buf_init.c and bufmgr.c files only.

Comments please.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread NikhilS
Hi, 
  This allows the length of a typical tuple chain to be extremely short in  practice. For a single connection issuing a stream of UPDATEs the chain  length will no more than 1 at any time.
 Only if there are no other transactions being held open, which makes this claim a lot weaker.True, but Nikhil has run tests that clearly show HOT outperformingcurrent situation in the case of long running transactions. The need to
optimise HeapTupleSatisfiesVacuum() and avoid long chains does stillremain a difficulty for both HOT and the current situation.Yes, I carried out some pgbench runs comparing our current HOT update
patch with PG82BETA2 sources for the long running transaction case. For
an apples to apples comparison we got roughly 170% improvement with the
HOT update patch over BETA2.

In case of BETA2, since all versions are in the main heap, we end up
doing multiple index scans for them. In case of HOT updates, we have a
single index entry with the chains getting traversed from the overflow
relation. So as Simon has mentioned the need to avoid long chains remains
a difficulty for both the situations.

Regards,
Nikhils


-- EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread NikhilS
Hi,I think the vision is that the overflow table would never be very large
because it can be vacuumed very aggressively. It has only tuples that are busyand will need vacuuming as soon as a transaction ends. Unlike the main tablewhich is mostly tuples that don't need vacuuming.
Thats right. vacuum if it gets a chance to work on the overflow relation seems to be doing a decent job in our runs. If autovacuum/vacuum gets to run optimally, the FSM information generated for the overflow relations will be able to serve a lot of new tuple requests avoiding undue/large bloat in the overflow relations.
Regards,Nikhils-- EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread NikhilS
Hi, 
On 11/10/06, Pavan Deolasee [EMAIL PROTECTED] wrote:

On 11/10/06, Tom Lane [EMAIL PROTECTED] wrote:
 
Pavan Deolasee 
[EMAIL PROTECTED] writes: On 11/10/06, Tom Lane [EMAIL PROTECTED] wrote: (2) Isn't this full of race conditions? 
 I agree, therecould be raceconditions. But IMO we can handle those.Doubtless you can prevent races by introducing a bunch of additionallocking.The question was really directed to how much concurrent 
performance is left, once you get done locking it down.I understand your point and I can clearly see a chance to improve upon the currentlocking implementation in the prototype even though we are seeing a good performance 
boost for 50 clients and 50 scaling factor with pgbench runs as mentioned by Nikhil.Regards,Pavan
Yes, we have done a number of runs with and without autovacuum with parameters like 50 clients, 50 scaling factor and 25000 transactions per client. 50 clients should introduce a decent amount of concurrency. The tps values observed with the HOT update patch (850 tps) were approximately 200+%better than PG82 sources (270). 


Runs with 25 clients, 25 scaling factor and 25000 transactions produce similar percentage increases with the HOT update patch. 

Regards,
Nikhils



-- EnterpriseDB http://www.enterprisedb.com 


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread NikhilS
Hi,
On 11/10/06, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote:
  True, but Nikhil has run tests that clearly show HOT outperforming  current situation in the case of long running transactions. The need
  to optimise HeapTupleSatisfiesVacuum() and avoid long chains does  still remain a difficulty for both HOT and the current situation. Yes, I carried out some pgbench runs comparing our current
 HOT update patch with PG82BETA2 sources for the long running transaction case. For an apples to apples comparison we gotVaccuums every 5 minutes, or no vaccuums ?


We tried with both. Vacuumseems to dolittle to help in a long running transaction case. Generally in most of the pgbench runs that we carried out, autovacuum did not seem to be of much help even to PG82.

Regards,
Nikhils-- EnterpriseDB http://www.enterprisedb.com 


Re: [HACKERS] [SPAM?] Re: Asynchronous I/O Support

2006-10-25 Thread NikhilS
Hi, 

While we are at async i/o. I think direct i/o and concurrent i/o also deserve a look at. The archives suggest that Bruce had some misgivings about dio because of no kernel caching, but almost all databases seem to (carefully) use dio (Solaris, Linux, ?) and cio (AIX) extensively nowadays.

Since these can be turned on a per file basis, perf testing them out should be simpler too. 

Regards,
Nikhils
On 10/25/06, Martijn van Oosterhout kleptog@svana.org wrote:
On Tue, Oct 24, 2006 at 12:53:23PM -0700, Ron Mayer wrote: Anyway, for those who want to see what they do in Linux,
 http://www.gelato.unsw.edu.au/lxr/source/mm/fadvise.c Pretty scary that Bruce said it could make older linuxes dump core - there isn't a lot of code there.
The bug was probably in the glibc interface to the kernel. Google foundthis:http://sourceware.org/ml/libc-hacker/2004-03/msg0.html
i.e. posix_fadvise appears to have been broken on all 64-bitarchitechtures prior to March 2004 due to a silly linking error.And then things like this:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313219Which suggest that prior to glibc 2.3.5, posix_fadvise crashed on 2.4kernels. That's a fairly recent version, so the bug would still befairly widespead.
Have a nice day,--Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate.
-BEGIN PGP SIGNATURE-Version: GnuPG v1.4.1 (GNU/Linux)iD8DBQFFPnYrIB7bNG8LQkwRAuAqAJ4uqx8y9LxUa9RcEDm7CPwZ2lkS2wCfYxjB2KzJ7iDYU21lumcZT6cHeLI==MzUY-END PGP SIGNATURE-
-- All the world's a stage, and most of us are desperately unrehearsed.


[HACKERS] Microseconds granularity SIGALRM interrupt support

2006-10-20 Thread NikhilS
Hi, 

Currently we have enable_sig_alarm() which provides millisecond level
granularity in specifying delay. I tried using it by just specifying
nonzero value for the timeval.it_value.tv_usec field before calling
setitimer, but didn't seem to work well. 

Do we have any function in the postgres codebase which provides microseconds level of delay coupled with SIGALRM support?

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.



Re: [HACKERS] Additional stats for Relations

2006-10-19 Thread NikhilS
Hi Jim, On 10/18/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
Also how many times a relation has been vacuumed (which puts all theother numbers in more perspective... good catch Simon). And I thinknumber of pages that could not be added to the FSM would also beextremely valuable.

By the above, do you mean the number of pages that could not be added
to the FSM because they had freespace which was less than the threshold
for this particular relation?

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
On Wed, Oct 18, 2006 at 11:27:39AM +0530, NikhilS wrote: Hi,
 So: heap_blks_reused (with Jim's semantics), heap_blks_extend, heap_blks_truncate are the interesting stats? Will try to work up a patch for this. Regards, Nikhils
 EnterpriseDB http://www.enterprisedb.com On 10/15/06, Simon Riggs [EMAIL PROTECTED] wrote: 
 On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:   On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote:I'm also not sure if this metric is what you actually want,
  since a  single page can be returned many times from the FSM even  between  vacuums. Tracking how many pages for a relation have been put
  into the  FSM might be more useful...   Nikhils  Pages might be put into the FSM, but by this metric don't we get the
  actual usage of the pages from the FSM? Agreed a single page can be  returned multiple times, but since it serves a new tuple, shouldn't we  track it?  Nikhils
  This makes sense for indexes, but only makes sense for heaps when we know that the backend will keep re-accessing the block until it is full - so only of interest in steady-state workloads.
  IMHO Jim's proposal makes more sense for general use.heap_blks_extend: The number of times file extend was  invoked on the   relation
  Sounds goodheap_blks_truncate: The total number of blocks that have  been truncated due   to vacuum activity 
e.g.  Sounds goodAs an addendum to the truncate stats above, we can also have  the additional   following stats:
 heap_blks_maxtruncate: The max block of buffers truncated in  one go heap_blks_ntruncate: The number of times truncate was called
  on this   relation  Those last 2 sound too complex for normal use and ntruncate is most likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
 a more interesting metric? We've got last vacuum date, but no indication of how frequently a vacuum has run.   Do you have a use-case for this info? I can see where it might
  be neat  to know, but I'm not sure how you'd actually use it in the  real world.   Nikhils  The use-case according to me is that these stats help prove the
  effectiveness of autovacuum/vacuum operations. By varying some autovac  guc variables, and doing subsequent (pgbench e.g.) runs, one can find  out the optimum values for these variables using these stats.
  Nikhils  This should be useful for tuning space allocation/deallocation. If we get this patch in early it should help get feedback on this area. 
 -- Simon Riggs EnterpriseDB http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.
--Jim
Nasby[EMAIL PROTECTED]EnterpriseDBhttp://enterprisedb.com512.569.9461 (cell)
-- All the world's a stage, and most of us are desperately unrehearsed.


Re: [HACKERS] Asynchronous I/O Support

2006-10-19 Thread NikhilS
Hi, On 10/18/06, Martijn van Oosterhout kleptog@svana.org wrote:
On Wed, Oct 18, 2006 at 08:04:29PM +1300, Mark Kirkwood wrote: bgwriter doing aysncronous I/O for the dirty buffers that it is supposed to sync Another decent use-case?
Good idea, but async i/o is generally poorly supported.
Async i/o is stably supported on most *nix (apart from Linux 2.6.*) plus Windows.  
Guess it would be still worth it, since one fine day 2.6.* will start supporting it properly too. 

Regards,
Nikhils
 Is it worth considering using readv(2) instead?Err, readv allows you to split a single consecutive read into multiple
buffers. Doesn't help at all for reads on widely areas of a file.Have a ncie day,--Martijn van Oosterhout kleptog@svana.org 
http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate.-BEGIN PGP SIGNATURE-Version: GnuPG v1.4.1 (GNU/Linux)iD8DBQFFNhtyIB7bNG8LQkwRApNAAJ9mOhEaFqU59HRCCoJS9k9HCZZl5gCdHDWt
FurlswevGH4CWErsjcWmwVk==sQoa-END PGP SIGNATURE--- All the world's a stage, and most of us are desperately unrehearsed.


Re: [HACKERS] Asynchronous I/O Support

2006-10-18 Thread NikhilS
Hi, 

bgwriter doing aysncronous I/O for the dirty buffers that it is supposed to sync
Another decent use-case?
Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Luke Lonergan [EMAIL PROTECTED] wrote:
Martijn,On 10/15/06 10:56 AM, Martijn van Oosterhout kleptog@svana.org wrote: Have enough systems actually got to the point of actually supporting
 async I/O that it's worth implementing?I think there are enough high end applications / systems that need it atthis point.The killer use-case we've identified is for the scattered I/O associated
with index + heap scans in Postgres.If we can issue ~5-15 I/Os in advancewhen the TIDs are widely separated it has the potential to increase the I/Ospeed by the number of disks in the tablespace being scanned.At this
point, that pattern will only use one disk.- Luke---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
-- All the world's a stage, and most of us are desperately unrehearsed.


Re: [HACKERS] Additional stats for Relations

2006-10-17 Thread NikhilS
Hi, 
So: 
heap_blks_reused (with Jim's semantics), heap_blks_extend,
heap_blks_truncate are the interesting stats? Will try to work up a
patch for this. 

Regards,
Nikhils
 EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Simon Riggs [EMAIL PROTECTED] wrote:
On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote: On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote: I'm also not sure if this metric is what you actually want,
 since a single page can be returned many times from the FSM even between vacuums. Tracking how many pages for a relation have been put into the
 FSM might be more useful... Nikhils Pages might be put into the FSM, but by this metric don't we get the actual usage of the pages from the FSM? Agreed a single page can be
 returned multiple times, but since it serves a new tuple, shouldn't we track it? NikhilsThis makes sense for indexes, but only makes sense for heaps when weknow that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.IMHO Jim's proposal makes more sense for general use.  heap_blks_extend: The number of times file extend was invoked on the
  relationSounds good  heap_blks_truncate: The total number of blocks that have been truncated due  to vacuum activity e.g.Sounds good
  As an addendum to the truncate stats above, we can also have the additional  following stats:   heap_blks_maxtruncate: The max block of buffers truncated in
 one go   heap_blks_ntruncate: The number of times truncate was called on this  relationThose last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums isa more interesting metric? We've got last vacuum date, but no indicationof how frequently a vacuum has run. Do you have a use-case for this info? I can see where it might
 be neat to know, but I'm not sure how you'd actually use it in the real world. Nikhils The use-case according to me is that these stats help prove the
 effectiveness of autovacuum/vacuum operations. By varying some autovac guc variables, and doing subsequent (pgbench e.g.) runs, one can find out the optimum values for these variables using these stats.
 NikhilsThis should be useful for tuning space allocation/deallocation. If weget this patch in early it should help get feedback on this area.--Simon RiggsEnterpriseDB 
http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.


Re: [HACKERS] Additional stats for Relations

2006-10-14 Thread NikhilS
Hi Jim,
On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote: Currently a select * from pg_statio_user_tables; displays only
 heap_blks_read, heap_blks_hit stats amongst others for the main relation. It would be good to have the following stats collected too. I think these stats can be used to better statistically analyze/understand the block I/O
 activity on the relation: heap_blks_reused: The number of buffers returned by the FSM for use to store a new tuple inThe description on this is misleading... FSM doesn't return buffers, it
returns pages that have free space on them.

Nikhils
FSM returns the block number from which we fetch the buffer. This is similar to the way we track buffer_read stats in ReadBuffer.
Nikhils

I'm also not sure if this metric is what you actually want, since asingle page can be returned many times from the FSM even between
vacuums. Tracking how many pages for a relation have been put into theFSM might be more useful...

Nikhils
Pages might be put into the FSM, but by this metric don't we get the actual usage of the pages from the FSM? Agreed a single page can be returned multiple times, but since it serves a new tuple, shouldn't we track it?

Nikhils

 heap_blks_extend: The number of times file extend was invoked on the relation heap_blks_truncate: The total number of blocks that have been truncated due
 to vacuum activity e.g. As an addendum to the truncate stats above, we can also have the additional following stats: heap_blks_maxtruncate: The max block of buffers truncated in one go
 heap_blks_ntruncate: The number of times truncate was called on this relation I can come up with a patch (already have one) for the above. Any thought/comments?Do you have a use-case for this info? I can see where it might be neat
to know, but I'm not sure how you'd actually use it in the real world.

Nikhils
The use-case according to me is that these stats help prove the effectiveness of autovacuum/vacuum operations. By varying some autovac guc variables, and doing subsequent (pgbench e.g.) runs, one can find out the optimum values for these variables using these stats. 

Nikhils

Regards, 
Nikhils 
EnterpriseDB http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.


[HACKERS] Additional stats for Relations

2006-10-13 Thread NikhilS
Hi, 

Currently a select * from pg_statio_user_tables; displays only
heap_blks_read, heap_blks_hit stats amongst others for the main
relation. It would be good to have the following stats collected too. I
think these stats can be used to better statistically
analyze/understand the block I/O activity on the relation:

heap_blks_reused: The number of buffers returned by the FSM for use to store a new tuple in

heap_blks_extend: The number of times file extend was invoked on the relation

heap_blks_truncate: The total number of blocks that have been truncated due to vacuum activity e.g.

As an addendum to the truncate stats above, we can also have the additional following stats:

heap_blks_maxtruncate: The max block of buffers truncated in one go

heap_blks_ntruncate: The number of times truncate was called on this relation
I can come up with a patch (already have one) for the above. Any thought/comments?

Regards,
Nikhils 
(www.enterprisedb.com)-- All the world's a stage, and most of us are desperately unrehearsed.