Re: [HACKERS] Commit fest queue

2008-04-11 Thread Joshua D. Drake
On Thu, 10 Apr 2008 23:26:28 -0400
Aidan Van Dyk [EMAIL PROTECTED] wrote:

 
 And then anybody asking a question about the status of something gives
 you a pedestal to show how nicely your tracker works.
 

If you think that is what I am after you obviously have no idea who are
you replying to. I suggest you take a step back and take a teaspoon of
reality.

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Joshua D. Drake
On Thu, 10 Apr 2008 23:23:20 -0400 (EDT)
Bruce Momjian [EMAIL PROTECTED] wrote:

 Brendan Jurd wrote:
  I'm not saying Bruce is doing a bad job, far from it.  I'm saying
  the job is impossible.
  
  I just wanted to correct the apparent impression that patches don't
  get ignored here.  Patches get ignored.  The difference between us
  and Apache is we pretend it doesn't happen and don't suggest to
  submitters what action to take when it does.  Which puts Apache
  ahead of us IMO.
 
 The apache group seems to say the patches are indeed ignored, rather
 then just delayed --- for us, every patch does get a reply, however
 delayed.
 

Bruce, I think that this comes back to the perception versus reality
discussion you and I have had on more than one occasion :). You are
correct that we always, eventually reply but, until we do (especially
when it takes a long time) it appears as if people are being ignored.

I think a FAQ entry may actually be appropriate in this case.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Magnus Hagander
Joshua D. Drake wrote:
 On Thu, 10 Apr 2008 23:23:20 -0400 (EDT)
 Bruce Momjian [EMAIL PROTECTED] wrote:
 
  Brendan Jurd wrote:
   I'm not saying Bruce is doing a bad job, far from it.  I'm saying
   the job is impossible.
   
   I just wanted to correct the apparent impression that patches
   don't get ignored here.  Patches get ignored.  The difference
   between us and Apache is we pretend it doesn't happen and don't
   suggest to submitters what action to take when it does.  Which
   puts Apache ahead of us IMO.
  
  The apache group seems to say the patches are indeed ignored, rather
  then just delayed --- for us, every patch does get a reply, however
  delayed.
  
 
 Bruce, I think that this comes back to the perception versus reality
 discussion you and I have had on more than one occasion :). You are
 correct that we always, eventually reply but, until we do (especially
 when it takes a long time) it appears as if people are being ignored.

I will continue to claim that no, we don't always do that. The vast
majority of the time we do, but there is no way that we can claim to
respond to them all. No, I cannot point you to an example where this
has happened. I *know* it has happened, because I do recall it, but I
don't recall the specific case. But more important, with the say things
are set up now, there is no way we can prove that we *do* respond to
them all.

I'm not saying we don't respond to *enough* of them. We're close enough
to all that I think that part is ok (though that still comes back to
the inability for the outside party to know if something is missed or
just delayed), but we can't honestly claim 100%.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] question on how to correctly communicate with external library functions which return malloc()'ed strings

2008-04-11 Thread Vladimir Volovich
Hi!

i'm using a module for postgresql (8.2 and 8.3) which links with a
3rd-party library and calls a function from there which returns a
malloc()'ed string.

[it's a libunac, and unac_string() but the question is general]

there was a pg_unac-8.2.tar.gz tarball distributed on the net, which has
the following implementation:

=
Datum unac(PG_FUNCTION_ARGS) {

  text *str = PG_GETARG_TEXT_P(0);
  text *result;
  
  int tlen, nlen, rsize;
  char *tstr, *nstr, *rptr;
  
  tlen = VARSIZE(str) - VARHDRSZ;
  tstr = (char *) palloc(tlen + 1);
  memcpy(tstr, VARDATA(str), tlen);
  tstr[tlen] = '\0';

  nstr = NULL; nlen = 0;
  unac_string(UTF-8, tstr, strlen(tstr), nstr, nlen);

  /* It may happen that unac_string returns NULL, because iconv
   * can't translate the input string. In this case we output
   * the string as it is. */
  if (nstr == NULL) nstr = tstr;
  
  rsize = strlen(nstr) + VARHDRSZ;
  result = (text *) palloc(rsize);
  rptr =  VARDATA(result);
  memcpy(rptr, nstr, rsize - VARHDRSZ);
  SET_VARSIZE(result, rsize);
  
  PG_RETURN_TEXT_P(result);
  
}
=

clearly there's a memory leak here: nstr never gets free()'d:
its value is copied to a palloc()-ated string but nstr itself is
never free()'d.

this call to palloc() appears to be necessary because one cannot just
pass malloc()-created nstr as a result of a function - there will be a
segfault, as far as i remember; therefore one needs to copy nstr to a
palloc()-created string and then to free() nstr (which is missing in the
above code).

i had to modify that function to remove the memory leak and to add some
more functionality i needed.

in a simplified form, it looks like:

=
  /* tstr is a palloc()'ed string */
  size_t nlen = 0;
  char *nstr = NULL;
  unac_string(UTF-8, tstr, strlen(tstr), nstr, nlen);
  if (nstr == NULL) {
nstr = strdup(tstr);
  }
  pfree(tstr);
  result = (text *) palloc(strlen(nstr) + VARHDRSZ);
  memcpy(VARDATA(result), nstr, strlen(nstr));
#if PG_VERSION_NUM = 80300
  SET_VARSIZE(result, strlen(nstr) + VARHDRSZ);
#else
  VARATT_SIZEP(result) = strlen(nstr) + VARHDRSZ;
#endif
  free(nstr);
  PG_RETURN_TEXT_P(result);
=

it worked fine with postgresql 8.2; with 8.3 it started segfaulting, and
it appeared that the reason is because in postgresql 8.3, the free is
a macro defined in
snowball/header.h:
#define free(a) pfree(a)

i solved it by adding these lines:

#ifdef free
#warning undefining free
#undef free
#endif
#include stdlib.h

but i want to ask you, postgresql hackers, on how we are supposed to
work with external library functions which return malloc()'ed strings?

should we create palloc()'ed copy and free() the original string?
then, is it correct to have that #define free(a) pfree(a) in
snowball/header.h?

Best,
v.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] question on how to correctly communicate with external library functions which return malloc()'ed strings

2008-04-11 Thread Martijn van Oosterhout
On Thu, Apr 10, 2008 at 07:28:29PM -0700, Vladimir Volovich wrote:
 it worked fine with postgresql 8.2; with 8.3 it started segfaulting, and
 it appeared that the reason is because in postgresql 8.3, the free is
 a macro defined in
 snowball/header.h:
 #define free(a) pfree(a)

It does seem wrong. Do you include that header file explicitly? Because
it shouldn't be necessary.

 #ifdef free
 #warning undefining free
 #undef free
 #endif
 #include stdlib.h
 
 but i want to ask you, postgresql hackers, on how we are supposed to
 work with external library functions which return malloc()'ed strings?

See if you can avoid including that header altogether...

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Zeugswetter Andreas OSB SD

  ... The really serious problem I've got with it is that
  it'd foreclose the possibility of returning actual index keys from
btree
  indexes, thus basically killing the usefulness of that idea.  I'm
not
  convinced it would offer enough gain to be worth paying that price.
 
  I do not see the serious problem ? The one key that is stored would 
  represent all tuples it points to.
 
 No, the entry represents a range of values for which the one key is
the
 lower bound.  You don't know just what the keys are for the other
 tuples, unless you go to the heap and look.

Thanks for explaining. I think that idea stands in the way of future 
improvements and should be dropped, yes.

 We could restrict GIT to only represent tuples with exactly the same
 key, but that takes away a whole lot of its use-case (especially so
 now that HOT is in there).

Ok, I was forgetting pg's outstanding partial index feature.
In pg you will most likely exclude highly duplicate keys from the index.
Since other dbs don't have this feature, it is common to have highly
duplicate keys (millions) there (unless you use very ugly workarounds).

I agree that the possibility of returning actual index keys and
filtering
rows early has more merrit. It could also be used to skip forward, when
the 
qual misses middle key columns.

I do still see compressing exactly equal keys (or exactly equal parts),
but not restricted to the same heap page, as a useful btree TODO 
(especially for long non unique keys). 
But it may really not be so important in pg.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Separate psql commands from arguments

2008-04-11 Thread Gregory Stark

Bernd Helmle [EMAIL PROTECTED] writes:

 What happens to aliases when you change the encoding in the middle of
 running one? Actually, come to think of it, what *does* happen to aliases
 when the encoding changes??

 Hmm? What happens if you are going to change your encoding within multiple
 backslash commands now?

I mean. What happens if I do:

\encoding 'UTF8'
\alias foo select 'あ' from tab;
\encoding euc-jp
\foo

What encoding is the alias itself stored in? Do we have to worry about
converting it when you switch client-encodings?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Separate psql commands from arguments

2008-04-11 Thread Bernd Helmle
--On Donnerstag, April 10, 2008 22:48:24 +0100 Gregory Stark 
[EMAIL PROTECTED] wrote:



Well I feel like storing a query and resending it later is something
predictable which will work reliably. Storing a psql input line and
reinterpreting it later is surely going to cause weird things to happen.


Als already mentioned in the original thread, excluding backslash commands 
from aliasing

isn't a big deal, but this would negate the usefulness of this feature.



Just for some examples off the top of my head, what happens if I define an
alias \foo which consists of \foo and call it? What happens if I have
mutually recursive aliases? What happens if I define \foo to run \
followed by its first argument, and I pass it foo? What happens if I
pass it unalias foo?



aliases are substituted only once within a call. They are acting exactly 
the same way you'll find on your favorite shell. If the current patch 
doesn't, it needs to be adressed, of course. Have you tried your examples 
on the current patch? Retesting shows it works as expected...



What happens if you press C-c during an alias, does it keep running
subsequent commands? What if the editor returns an error after a \e
command? What about if a \i command doesn't find the file?


Since aliases are simple substitutions nothing different. Multiple commands 
are placed directly into the query buffer, so nothing different than the 
current behavior should happen. Again, do you have any concrete examples?




What happens to aliases when you change the encoding in the middle of
running one? Actually, come to think of it, what *does* happen to aliases
when the encoding changes??


Hmm? What happens if you are going to change your encoding within multiple 
backslash commands now?





Basically it sounds like you're treating psql as if it was a well defined
language with well defined syntax and semantics. And I don't think it is.


So i do.


It's just one big if-else-if block with lots of strcmps. There's no
infrastructure to parse or manage a stack of calls to functions.


Again, it's text _substitution_, replacing the very first token with 
another token. Wether this token can be complex or not is another question, 
but of course, i'm not so foolish to describe psql as turing complete 
language.


--
 Thanks

   Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding pipelining support to set returning functions

2008-04-11 Thread Hannu Krosing

On Fri, 2008-04-11 at 10:57 +0200, Hans-Juergen Schoenig wrote:
 Hannu Krosing wrote:
  A question to all pg hackers
 
  Is anybody working on adding pipelining to set returning functions.
 
  How much effort would it take ?
 
  Where should I start digging ?

 
 i asked myself basically the same question some time ago.
 pipelining seems fairly impossible unless we ban joins on those 
 plugins completely.

Not really, just they have to be materialized before joins, or
streaming node has to be at the driving side of the join, so you can
fetch one tuple and then join it to index or hash lookup

 i think this should be fine for your case (no need to join PL/proxy 
 partitions) - what we want here is to re-unify data and sent it through 
 centralized BI.

In PL/Proxy context I was aiming at sorting data at nodes and then being
able to merge several partitions while preserving order, and doing this
without having to store N*partition_size rows in resultset.

 
...

 
 currently things like nodeSeqscan do SeqNext and so on - one records is 
 passed on to the next level.
 why not have a nodePlugin or so doing the same?
 or maybe some additional calling convention for streaming functions...
 
 e.g.:
 CREATE STREAMING FUNCTION xy() RETURNS NEXT RECORD AS $$
 return exactly one record to keep doing
 return NULL to mark end of table
 $$ LANGUAGE 'any';
 
 so - for those function no ...
 WHILE ...
RETURN NEXT
 
 but just one tuple per call ...
 this would pretty much do it for this case.
 i would not even call this a special case - whenever there is a LOT of 
 data, this could make sense.

In python (an also javascript starting at version 1.7) you do it by
returning a generator from a function, which is done by using YIELD
instead of return.

 def numgen(i):
... while 1:
... yield i
... i += 1
 ng = numgen(1)
 ng
generator object at 0xb7ce3bcc
 ng.next()
1
 ng.next()
2

In fact any pl/python function SRF puts its result set to retun buffer
using generator mechanisms, even in case you return the result from
function as a list or an array.

What would be nice, is to wire the python generator directly to
postgreSQLs FuncNext call

At C function level this should probably be a mirror image of AGGREGATE
functions, where you have a init() function that prepares some opaque
data structure and next() for getting records with some special value
for end, and preferrably also some finalize() to clean up in case
postgresql stops before next() indicated EOD.

Maybe some extra info would be nice for optimized, like expected
rowcouunt or that data is returned sorted on some field. This would be
good for current return mechanisms as well.

-
Hannu









-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding pipelining support to set returning functions

2008-04-11 Thread Martijn van Oosterhout
On Fri, Apr 11, 2008 at 01:00:04PM +0300, Hannu Krosing wrote:
  i asked myself basically the same question some time ago.
  pipelining seems fairly impossible unless we ban joins on those 
  plugins completely.
 
 Not really, just they have to be materialized before joins, or
 streaming node has to be at the driving side of the join, so you can
 fetch one tuple and then join it to index or hash lookup

I thought these was code in the planner already that said: if node A
requires seeking of subnode B and B doesn't support that, insert
Materialize node. Naturally there's a cost to that, so it would favour
plans that did not require seeking...

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


[HACKERS] stat() vs cygwin

2008-04-11 Thread Magnus Hagander
It seems my fix for stat() broke cygwin, because it doesn't have
dosmaperr() available. The way I see it there are two ways to fix this:

1) Don't apply the stat fix for cygwin.

2) Make our dosmaperr() function be used on cygwin.


I don't know if the fix is actually needed on cygwin. Can someone with
access to such an environment test it and see?

The easy check, easier than the table, goes something along the line
of:
CREATE TABLE test(t int);
INSERT INTO test(t) SELECT * FROM generate_series(1,10);
SELECT pg_relation_size('t');
SELECT pg_sleep(5);
SELECT pg_relation_size('t');


Without the patch on win32, the first pg_relation_size comes out as 0,
and the second one correct. With the patch, they come out equal. They
should, of course, always come out equal.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Free Space Map data structure

2008-04-11 Thread Heikki Linnakangas

Hannu Krosing wrote:

BTW, I'm pretty sure I have figured out the method for storing minimal
required binary tree as an array, where adding each page adds exactly
one upper node. The node added is often not the immediate parent, but is
always the one needed for covering all nodes.

I just have to write it up in an understandable way and then you all can
look at it and tell if it is something well-known from Knuth or Date ;)


Find sample code attached:

not optimal at all, meant as proof-of-concept.

just run the file to see how it works, some comments in the beginning.

currently it interleaves leaf and internal nodes, but it may be better
for some uses (like seqscanning leaf nodes when searching for clustered
pos) to separate them, for example having 1st 4k for lef nodes and 2nd
for internal nodes.

also, I think that having a fan-out factor above 2 (making it more like
b-tree instead of binary tree) would be more efficient due to CPU
caches, but it takes some more work to figure it out. 


At least it would be more storage efficient, as you wouldn't need as 
many non-leaf nodes. You would need more comparisons when traversing or 
updating the tree, but as you pointed out that might be very cheap 
because of cache effects.


Within a page, the traditional array representation, where root is at 
position 0, it's children are at 1, 2 and so forth, is actually OK. When 
the depth of the tree changes, you'll need to memcpy data around and 
rebuild parent nodes, but that's acceptable. Or we can fix the depth of 
the tree to the max. that fits on a page, and fill it with zeros when 
it's not full; all but the rightmost pages are always full anyway.


Scaling into multiple pages, we don't want to move nodes across page 
boundaries when extending the relation, so we do need something like 
your scheme for that. Taking the upper nodes into account, one FSM page 
can hold free space information of ~4k heap (or lower level FSM) pages. 
IOW, we have a fan out of 4k across pages. With a fanout like that, we 
only need 3-4 levels to address over 500 TB of data.


I've attached a diagram illustrating what I have in mind. In the 
diagram, each page can hold only 7 nodes, but in reality that would be ~ 
BLCKSZ/2, or the 4k with default block size. The heap consists of 10 
pages, the bottom leaf nodes correspond the heap pages. The leaf nodes 
of the upper FSM page store the max the lower FSM pages; they should 
match the top nodes of the lower FSM pages. The rightmost nodes in the 
tree, colored grey, are unused.


I'm pretty happy with this structure. The concurrency seems reasonably 
good, though will need to figure out how exactly the locking should work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
inline: fsm-drawing.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding pipelining support to set returning functions

2008-04-11 Thread Hans-Juergen Schoenig

Hannu Krosing wrote:

A question to all pg hackers

Is anybody working on adding pipelining to set returning functions.

How much effort would it take ?

Where should I start digging ?
  


i asked myself basically the same question some time ago.
pipelining seems fairly impossible unless we ban joins on those 
plugins completely.
i think this should be fine for your case (no need to join PL/proxy 
partitions) - what we want here is to re-unify data and sent it through 
centralized BI.




BACKGROUND:

AFAICS , currently set returning functions materialise their results
before returning, as seen by this simple test:

hannu=# select * from generate_series(1,10) limit 2;
 generate_series 
-

   1
   2
(2 rows)

Time: 1.183 ms


hannu=# select * from generate_series(1,1000) limit 2;
 generate_series 
-

   1
   2
(2 rows)

Time: 3795.032 ms

being able to pipeline (generate results as needed) would enable several
interesting techniques, especially if combined with pl/proxy or any
other functions which stream external data.

Applications and design patterns like http://telegraph.cs.berkeley.edu/
or http://labs.google.com/papers/mapreduce.html would suddenly become
very easy to implement.

-
Hannu

  


currently things like nodeSeqscan do SeqNext and so on - one records is 
passed on to the next level.

why not have a nodePlugin or so doing the same?
or maybe some additional calling convention for streaming functions...

e.g.:
CREATE STREAMING FUNCTION xy() RETURNS NEXT RECORD AS $$
   return exactly one record to keep doing
   return NULL to mark end of table
$$ LANGUAGE 'any';

so - for those function no ...
   WHILE ...
  RETURN NEXT

but just one tuple per call ...
this would pretty much do it for this case.
i would not even call this a special case - whenever there is a LOT of 
data, this could make sense.


   best regards,

  hans

--
Cybertec Schönig  Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Dunstan
On Fri, Apr 11, 2008 at 1:32 PM, Magnus Hagander [EMAIL PROTECTED] wrote:

The apache group seems to say the patches are indeed ignored, rather
then just delayed --- for us, every patch does get a reply, however
delayed.
   
  
   Bruce, I think that this comes back to the perception versus reality
   discussion you and I have had on more than one occasion :). You are
   correct that we always, eventually reply but, until we do (especially
   when it takes a long time) it appears as if people are being ignored.

  I will continue to claim that no, we don't always do that. The vast
  majority of the time we do, but there is no way that we can claim to
  respond to them all. No, I cannot point you to an example where this
  has happened.

Well, I can provide an easy example: my first patch [1]. We hashed out
the design on -hackers as contributors are encouraged to do, and I
submitted my first patch to -patches. It included a bunch of
first-time-contributor questions that I had about the proper pgsql way
to do things. It got zero responses. It was as if I had dropped it
into a black hole. Eventually I re-submitted it after 8.2 was
released, and some time after that I got a your-patch-has-been-saved
email.

I have no idea how often that happens, perhaps I'm an exception, but
it was incredibly discouraging.

However I see this as being a side-issue - the problem is knowing the
current status of patches, not the occasional patch that drops
through. And if I as a submitter can stick a patch up on a wiki or
tracker and then email the list for feedback that's probably good
enough, and we could probably do away with -patches altogether,
dealing with the fragmentation issue. That alone would reassure a
contributor that their patch wouldn't get lost, though it wouldn't
guarantee that anyone would look at it.

The reason a tracker is better imo than a wiki is that a wiki still
needs someone to maintain an index page (or multiple index pages for
different queues), so there's still an opportunity for something to
fall through. Or are we suggesting that a first-time contributor
should be editing a patch queue index page on the wiki? Trackers don't
have these issues though - managing lists like this is what they were
born to do.

Cheers

Tom

[1] http://archives.postgresql.org/pgsql-patches/2006-09/msg0.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Teodor Sigaev

Teodor, do you have any thoughts about exactly how you'd fix @@@ ?
I suppose that the recheck-need is not really a property of specific
tuples, but of a particular query, for that case.  Where would you
want to detect that?


tsquery may include restriction by weight of search terms: 'sea  port:A'. GIN 
index doesn't store information about weights, so the only difference between @@ 
and @@@ is that @@@ is marked with RECHECK flag. I think, the better way is set 
flag about required recheck by looking value from index, not for tsquery. It 
gives to us more flexibility.


So, I planned to add pointer to bool to consistent method, so signature will be
bool consistent( bool check[], StrategyNumber n, Datum query, bool *needRecheck)

Returning value of needRecheck should be ignored for operation not marked by 
RECHECK flag in opclass. needRecheck should be initialized to true before call 
of consistent method to keep compatibility with old opclasses.


To define, is recheck needed or not, the better way is to check actually needed 
values. For example, let tsquery is equal to
'foo | bar | qq:A' and tsvetor = 'foo:1,2,3 asdasdasd:4'. Obviously recheck is 
not needed. So patch is close to trivial:


*** tsginidx.c.orig 2008-04-11 17:08:37.0 +0400
--- tsginidx.c  2008-04-11 17:18:45.0 +0400
***
*** 109,114 
--- 109,115 
  {
QueryItem  *frst;
bool   *mapped_check;
+   bool   *needRecheck;
  } GinChkVal;

  static bool
***
*** 116,121 
--- 117,125 
  {
GinChkVal  *gcv = (GinChkVal *) checkval;

+   if ( val-weight )
+   *(gcv-needRecheck) = true;
+
return gcv-mapped_check[((QueryItem *) val) - gcv-frst];
  }

***
*** 144,149 
--- 148,155 

gcv.frst = item = GETQUERY(query);
gcv.mapped_check = (bool *) palloc(sizeof(bool) * query-size);
+   gcv.needRecheck = PG_GETARG_POINTER(3);
+   *(gcv.needRecheck) = false;

for (i = 0; i  query-size; i++)
if (item[i].type == QI_VAL)





--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Gregory Stark
Tom Dunstan [EMAIL PROTECTED] writes:

 The reason a tracker is better imo than a wiki is that a wiki still
 needs someone to maintain an index page (or multiple index pages for
 different queues), so there's still an opportunity for something to
 fall through. 

For the umpteenth time bug trackers *still* require someone to maintain the
list. It's more structured which is great if it matches the structure you
want, but it still requires someone to go open bugs when a bug is reported by
email, close bogus bugs or bugs fixed via collateral damage, update bugs when
discussions happen on the list about them, etc.

Imagining that bug trackers are all automatic and don't require any work or
impose any restrictions is missing the whole point. The whole benefit they
provide is precisely that they make that process systematic. They don't do it
for you.

I've seen no discussion about the structure the various bug trackers use and
which would match better with our processes. *That* would be the only useful
discussion to be having. What attributes do you think patch submissions have,
what is the workflow which sets which attributes at which time, who is
involved in which step of this workflow? Etc.

Proposing specific tools without a consensus on what that process is putting
the cart before the horse. You pick the tools to fit what you want to do. We
haven't decided what we want to do yet.

Personally I don't think we *know* what we want to do and that's why the wiki
is a good interim tool. We'll see what info we need there and who needs to
fill it in and find out what tool will fit our needs.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Joshua D. Drake
On Fri, 11 Apr 2008 15:44:43 +0100
Gregory Stark [EMAIL PROTECTED] wrote:

 Proposing specific tools without a consensus on what that process is
 putting the cart before the horse. You pick the tools to fit what you
 want to do. We haven't decided what we want to do yet.
 
 Personally I don't think we *know* what we want to do and that's why
 the wiki is a good interim tool. We'll see what info we need there
 and who needs to fill it in and find out what tool will fit our needs.
 

What I find interesting about this email is that it is just as useless
as you claim Tom's is.

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Alvaro Herrera
Gregory Stark escribió:

 Personally I don't think we *know* what we want to do and that's why the wiki
 is a good interim tool. We'll see what info we need there and who needs to
 fill it in and find out what tool will fit our needs.

+1.  Let's learn what we need first, and find an appropriate tool to let
us do it more easily later.  We just had our first commitfest, and the
Wiki was already a change.  Let's see what we can learn from it.

For example it was suggested that we need templates to format the patch
entries.  What fields would have been useful?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Personally I don't think we *know* what we want to do and that's why the wiki
 is a good interim tool.

Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.

Perhaps an example will help make the point.  Throughout this past fest
I was desperately wishing for a way to group and label related issues
--- we had a pile of items focused around index AM API questions, and
another pile focused around mapping problems (FSM/DSM/Visibility
map/etc), and being able to put those together would have made it a
lot clearer what needed to be looked at together with what else.
On a wiki page it'd have been no trouble at all to create ad-hoc
sub-headings and sort the individual items into whatever grouping and
ordering made sense (in fact, Alvaro did some of that on his own).
It was basically impossible to do any such thing with Bruce's mhonarc
page, though he did kluge up some ways to partially address the issue
by merging threads.  The bug trackers I've dealt with haven't got much
flexibility in this respect either --- the sorts of global views you can
get are entirely determined by the tool.  I'm fairly certain that a
tracker designed around the assumption that different entries are
essentially independent isn't going to be very helpful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stat() vs cygwin

2008-04-11 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 It seems my fix for stat() broke cygwin, because it doesn't have
 dosmaperr() available.

Are you sure you aren't just missing an #include?  The other places
where we call _dosmaperr don't seem to be protected by anything more
than #ifdef WIN32.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 So, I planned to add pointer to bool to consistent method, so signature will 
 be
 bool consistent( bool check[], StrategyNumber n, Datum query, bool 
 *needRecheck)

 Returning value of needRecheck should be ignored for operation not marked by 
 RECHECK flag in opclass. needRecheck should be initialized to true before 
 call 
 of consistent method to keep compatibility with old opclasses.

Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value?  If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Bruce Momjian
Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  Personally I don't think we *know* what we want to do and that's why the 
  wiki
  is a good interim tool.
 
 Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
 flexible way of experimenting with tracking commit-fest issues.
 A year from now, we might have enough experience to decide that some
 more-rigidly-structured tool will do what we need, but we don't have
 it today.  We spent enough time fighting the limitations of Bruce's
 mhonarc page that we ought to be wary of adopting some other tool that
 wants you to do things its way.
 
 Perhaps an example will help make the point.  Throughout this past fest
 I was desperately wishing for a way to group and label related issues
 --- we had a pile of items focused around index AM API questions, and
 another pile focused around mapping problems (FSM/DSM/Visibility
 map/etc), and being able to put those together would have made it a
 lot clearer what needed to be looked at together with what else.
 On a wiki page it'd have been no trouble at all to create ad-hoc
 sub-headings and sort the individual items into whatever grouping and

I feel subgroups is something we are going to need from a bug or patch
tracker.  The TODO list uses subgroups.  I think a flat bug/patch list
is harder to understand.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Bruce Momjian
Marc G. Fournier wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
 In the projects I'm involved in, tends to be for used for both purposes ... 
 one 
 central location for everything ...

Yea, good point.  I think our big question is what justification do we
have for doing things different from everyone else?  I think it is fine
for us to be different, but we should know the reason why.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Heikki Linnakangas

Tom Lane wrote:

Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value?  If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.


I remember that last spring, in the context of GIT, you were worried 
about the performance implication of preparing to recheck rows when no 
rechecks are needed. I didn't quite buy that back then, but this would 
have the same issue.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Perhaps it would be better to initialize needRecheck to the opclass
 RECHECK flag value?  If the consistent function does nothing, the
 behavior is the same as before, but it can flip the flag in either
 direction if it wants.

 I remember that last spring, in the context of GIT, you were worried 
 about the performance implication of preparing to recheck rows when no 
 rechecks are needed. I didn't quite buy that back then, but this would 
 have the same issue.

As I mentioned upthread, it appears that we're paying that overhead
anyway --- at least nodeIndexscan.c thinks we are.  I need to dig into
the planner a bit today and see whether it's taking any shortcuts for
non-RECHECK operators.

If it really is saving anything, then I'd agree that only RECHECK-marked
operators should be allowed to adjust the flag.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread PFC


	Well, I realized the idea of global prepared statements actually sucked,  
so I set on another approach thanks to ideas from this list, this is  
caching query plans.


First, let's see if there is low hanging fruit with the typical small,  
often-executed queries that are so frequent on websites.
Tables test, test2 and test3 contain id (integer primary key) and another  
integer field. There are 100K rows in each.


First, the simplest query :
SELECT * FROM test WHERE id = $1

110 us : Send query as text (PHP:pg_query - PQexec)
125 us : Parse+Bind (PHP:pg_query_params - PQexecParams)
 67 us : Execute a previously prepared statement (PHP:pg_execute -  
PQexecPrepared)


A slightly more complex one but still pretty classic :
SELECT * FROM (SELECT * FROM test WHERE id$1 ORDER BY id LIMIT 5) AS a  
NATURAL LEFT JOIN test2 NATURAL LEFT JOIN test3 ORDER BY id


523 us : Send query as text (PHP:pg_query - PQexec)
580 us : Parse+Bind (PHP:pg_query_params - PQexecParams)
148 us : Execute a previously prepared statement (PHP:pg_execute -  
PQexecPrepared)


OK, so there is low hanging fruit since the parsing+planning time of those  
is longer than doing the query itself.


Since the Parse message includes a $-parameterized query that is to be  
prepared, it seems logical to put the caching logic there : the query  
string (without parameters) makes a nice cache key.


So I made a really quick and really dirty experimentation without changing  
the wire protocol between client and server. This is only proof of  
concept.


Try #1 : in exec_parse_message(), if the statement is named, look it up in  
the prepared statements cache, if it is found, return at once and do  
nothing else.
To exploit this, I issue a pg_prepare() followed by pg_execute() at every  
query, wether or not the statement exists. If it already exists,  
pg_prepare() now does nothing (except losing a little time).


Results :
 88 us : simple query
173 us : complex query

So, the timings are between a simple execute and a plan+execute. It  
provides a nice performance gain versus replanning every time, but not  
perfect.


Try #2 : again, in exec_parse_message(), if the statement is unnamed, I  
use the query string as the statement name, search the plan in the  
prepared statements hash table. If it is not found, then it is prepared.  
Then I make the unnamed statement plan point to this. Of course, this is  
dangerous since it probably introduces a dozen crash bugs, but for this  
proof of concept, it's OK.
Client code is unchanged, PQexecParams will benefit from the plan caching,  
since it always sends a Parse+Bind message using the unnamed statement.


Results are identical to executing an execute on a prepared statement,  
modulo a few microseconds.
This means the overhead of sending the Parse message, and of the server  
ignoring it when the statement is cached, is negligible.


	So, where to go from that ? I don't see a way to implement this without a  
(backwards-compatible) change to the wire protocol, because the clients  
will want to specify when a plan should be cached or not. Since the user  
should not have to name each and every one of the statements they want to  
use plan caching, I see the following choices :


	- Add a new Parse+Bind command, which gets the $-parameterized SQL and  
the parameters. If the plan is cached, grab it and execute, else prepare  
and execute. Add a flag to allow the client to specify if he wants caching  
or not.

Pros : Only one message, faster
	Cons : SQL is transmitted in full, useless most of the time, but this  
overhead is rather small.


	- Send the SQL with Bind as statement name, add a flag to Bind telling it  
to report a cache miss instead of raising an error, then have the client  
send a Parse and Bind again.


	- Should there be one common hashtable for named prepared statements and  
cached plans, or two hashtables ? Using the SQL string as the statement  
name is not clean.



























--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Dunstan
(I've already said that I think the wiki is a great step forward,
FWIW, and I'm not in any way suggesting that we should just drop it
and pick my favorite issue tracker or anything. However, for those
interested in discussion about theoretical benefits and cons of the
different systems...)

On Fri, Apr 11, 2008 at 8:14 PM, Gregory Stark [EMAIL PROTECTED] wrote:
  For the umpteenth time bug trackers *still* require someone to maintain the
  list. It's more structured which is great if it matches the structure you
  want, but it still requires someone to go open bugs when a bug is reported by
  email, close bogus bugs or bugs fixed via collateral damage, update bugs when
  discussions happen on the list about them, etc.

Perhaps I wasn't clear. I was describing the specific case where a
patch submitter would be required by project policy to submit a patch
to a tracker of some kind before discussing it on the list. So there
wouldn't be much of an opportunity for those to fall through. And
owners of a particular patch would be expected to keep them up to date
re discussions. I wasn't discussing emailed bug reports.

The problem with a tracker is that it will give you a list of every
damn thing that people have put in there, and the data in there can
stagnate. The problem with manually maintained lists is that stuff
might not get on there at all. What I and at least one other person
have tried to say is that the problem of dead issues needing to be
closed is a much easier problem to deal with than the problem of an
issue not being there at all. Heck, *I* could trawl a tracker and
email authors of seemingly dead patches. But there's no way I could
maintain a patch list manually without following each and every
discussion.

  I've seen no discussion about the structure the various bug trackers use and
  which would match better with our processes. *That* would be the only useful
  discussion to be having. What attributes do you think patch submissions have,
  what is the workflow which sets which attributes at which time, who is
  involved in which step of this workflow? Etc.

Well, I do recall reading at least one thread (not terribly recently)
discussing people's favourite trackers, but IIRC it turned into
something similar to what happens when we discuss CVS replacements :)

  Proposing specific tools without a consensus on what that process is putting
  the cart before the horse. You pick the tools to fit what you want to do. We
  haven't decided what we want to do yet.

This is true. But your processes get shaped by your tools, too. Our
processes might be shaped by what we've got, and so continue forever.
There should be some awareness of what else is out there.

An example of this is the way code flows around the Linux kernel. I'm
not for a minute advocating their general structure, but git seems a
far better tool than the combination of CVS and emailing patches.
Patch announcements aren't always here's the patch as much as
please pull from over here. Their tool support seems rather better
than ours. And it's changed the way they work.

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Teodor Sigaev



Perhaps it would be better to initialize needRecheck to the opclass
RECHECK flag value?  If the consistent function does nothing, the
behavior is the same as before, but it can flip the flag in either
direction if it wants.


I have not any objections. In this case (and preparing of rechecking is cheap) 
RECHECK flag could be removed.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread Alvaro Herrera
PFC wrote:

   So, where to go from that ? I don't see a way to implement this without 
 a (backwards-compatible) change to the wire protocol, because the clients 
 will want to specify when a plan should be cached or not. Since the user  
 should not have to name each and every one of the statements they want to 
 use plan caching, I see the following choices :

I don't understand the point here.  We already have cached plans: you
send a Parse.  You can then Bind/Execute many times.

Maybe what we need is support for this in libpq, so that PHP can use it?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 The bug trackers I've dealt with haven't got much flexibility in this
 respect either --- the sorts of global views you can get are entirely
 determined by the tool. I'm fairly certain that a tracker designed around
 the assumption that different entries are essentially independent isn't
 going to be very helpful.

The bug trackers I've dealt with did have some way to merge bugs, though
obviously nothing as flexible as a wiki page.

Debbugs allows you to merge two bugs in which case the two bug#s are synonyms
for each other. All messages related to either bug appear in one list and
there's only one set of status bits for both bugs.

Bugzilla allows you to mark bugs as duplicates but basically this amounts to
marking one bug as a duplicate and closing it. Both bugs get notes indicating
the other bug# but you have to click on a link to see the info attached to the
closed duplicates.

I've noticed Mozilla creates a lot of tracking bugs for classes of problems.
I think this is related to your observation. The tracking bug just lists all
the other related bugs which fall under that topic.

I'm sure trac has a solution for this, I'm curious to hear how it works.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger

Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.


In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between replying from within the tracker  
and replying from within your normal mail client.  But you can still  
use either and the communication doesn't get scattered all over the  
place.  There would also be an option to let you choose another  
tracker item to merge with the current one so that relevant threads  
can be aggregated into the same tracker item.


At this point you could have a page that lists all unassigned tracker  
items so that someone looking for some work to do could quickly scan a  
short easy to read list and pick something.


Problem #3: When a new patch creator posts a new patch they need  
feedback quickly so they know at least that they did the right thing  
and someone is aware of the patch.
Solution 3: The tracker has  a list of all new threads that haven't  
been looked at.  Someone can then go through the list of unprocessed  
items and see if it has a patch. If it does he can flag that item as a  
patch submission and it will immediately show up on the list for patch  
reviewers to look through.  It can also be assigned right there to a  
specific fest but will default to the soonest one.  Once it is flagged  
an email will automatically go out telling them their patch is pending  
review.


Problem #4: Patches may or may not, on rare 

Re: [HACKERS] Commit fest queue

2008-04-11 Thread Brendan Jurd
On Sat, Apr 12, 2008 at 3:13 AM, Gregory Stark [EMAIL PROTECTED] wrote:
 Tom Lane [EMAIL PROTECTED] writes:

   The bug trackers I've dealt with haven't got much flexibility in this
   respect either --- the sorts of global views you can get are entirely
   determined by the tool. I'm fairly certain that a tracker designed around
   the assumption that different entries are essentially independent isn't
   going to be very helpful.

  I'm sure trac has a solution for this, I'm curious to hear how it works.


In Trac, if I just want to loosely associate several tickets together
I'd use *keywords*, e.g., put index am in the keywords list for
several tickets, and then they'll show up prominently when I search
for those terms.

If I want something more structured I'd use a *milestone*.  I'd create
an Index AM milestone and attach all the relevant tickets to it.
Then I can easily pull up a report of all open tickets on the Index AM
milestone (or all closed tickets, or all tickets regardless of status,
or all tickets assigned to me, or all tickets not assigned to anyone
yet, or ...)

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger

Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.



In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between replying from within the tracker  
and replying from within your normal mail client.  But you can still  
use either and the communication doesn't get scattered all over the  
place.  There would also be an option to let you choose another  
tracker item to merge with the current one so that relevant threads  
can be aggregated into the same tracker item.


At this point you could have a page that lists all unassigned tracker  
items so that someone looking for some work to do could quickly scan a  
short easy to read list and pick something.


Problem #3: When a new patch creator posts a new patch they need  
feedback quickly so they know at least that they did the right thing  
and someone is aware of the patch.
Solution 3: The tracker has  a list of all new threads that haven't  
been looked at.  Someone can then go through the list of unprocessed  
items and see if it has a patch. If it does he can flag that item as a  
patch submission and it will immediately show up on the list for patch  
reviewers to look through.  It can also be assigned right there to a  
specific fest but will default to the soonest one.  Once it is flagged  
an email will automatically go out telling them their patch is pending  
review.


Problem #4: Patches may or may not, on rare 

Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
I wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 I remember that last spring, in the context of GIT, you were worried 
 about the performance implication of preparing to recheck rows when no 
 rechecks are needed. I didn't quite buy that back then, but this would 
 have the same issue.

 As I mentioned upthread, it appears that we're paying that overhead
 anyway --- at least nodeIndexscan.c thinks we are.  I need to dig into
 the planner a bit today and see whether it's taking any shortcuts for
 non-RECHECK operators.

Yeah, we are paying that overhead.  The reason is that the planner
always constructs an indexqualorig expression and the executor
always initializes it.  The only place where it's used currently in
a plain indexscan is for EvalPlanQual rechecking, but we could certainly
use it for lossy-operator rechecking.  (The implication of this is that
if any of the operators in a multi-index-qual indexscan are lossy, we'd
recheck all of them upon fetching the heap tuple.  Does anyone feel
that's not good enough?  Most of the practical cases I can think of
for multi-operator scans are for btree anyway, so it's not clear that
there's much value in complicating matters to evaluate just some of the
indexqualorig clauses.)

This would effectively move *all* management of lossy operators to
runtime; the planner wouldn't really think about it at all.  We could
simplify createplan.c a bit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Rick Gigger


On Apr 11, 2008, at 9:30 AM, Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:
Personally I don't think we *know* what we want to do and that's  
why the wiki

is a good interim tool.


Yup, that is *exactly* the point.  A wiki page is a zero-setup-cost,
flexible way of experimenting with tracking commit-fest issues.
A year from now, we might have enough experience to decide that some
more-rigidly-structured tool will do what we need, but we don't have
it today.  We spent enough time fighting the limitations of Bruce's
mhonarc page that we ought to be wary of adopting some other tool that
wants you to do things its way.

Perhaps an example will help make the point.  Throughout this past  
fest

I was desperately wishing for a way to group and label related issues
--- we had a pile of items focused around index AM API questions, and
another pile focused around mapping problems (FSM/DSM/Visibility
map/etc), and being able to put those together would have made it a
lot clearer what needed to be looked at together with what else.
On a wiki page it'd have been no trouble at all to create ad-hoc
sub-headings and sort the individual items into whatever grouping and
ordering made sense (in fact, Alvaro did some of that on his own).
It was basically impossible to do any such thing with Bruce's mhonarc
page, though he did kluge up some ways to partially address the issue
by merging threads.  The bug trackers I've dealt with haven't got much
flexibility in this respect either --- the sorts of global views you  
can

get are entirely determined by the tool.  I'm fairly certain that a
tracker designed around the assumption that different entries are
essentially independent isn't going to be very helpful.


In case you don't recognize my name/email address I am just someone  
who has been lurking on hackers for several years now.  I have been  
following this thread closely and I thought it might be useful for  
someone to try to summarize what it seems like people need so everyone  
can see if they are on the same page. After stating each problem I  
will also summarize proposed solutions and introduce a few of my own,  
just to see what people think.  Also I have been a web developer for  
the 7 years so I may be able to help out with this, as long as the  
time span is sufficiently long.  Please feel free to correct anything  
below (as if I have to say that).  Remember I am not trying to push  
any idea here, I am just trying to summarize everyone else's ideas and  
humbly propose a few ideas that might help.


It's clear that you want to keep the email list as the primary means  
of communication.  So that obviously has to stay.  The web archives  
should stay the primary means of referencing past discussion.


Problem #1: The current archive system breaks threads across monthly  
boundaries.

Solution 1A: Find other off the shelf software that does this better.
Solution 1B: Write some custom software to do this.  Can you set up  
hooks into the mail server so that a script could be run each time a  
new email is accepted?  Given the full message headers and body, what  
is the algorithm for linking methods into threads?  Given the right  
answers to those two questions and this might be a fairly simple task.


Problem #2: When people are looking for something to do we need a list  
of all pending issues that can be easily searched.  Ideally the  
maintenance of this list will be as low as possible.
Solution 2: Create a custom tracker.  The above email and others seem  
to indicate nothing off the shelf will do.  Can a hook be set up into  
the mail server so that incoming emails can not only be read and  
indexed but also altered with a script?  Each new thread from patches  
could automatically create a tracker item.  At the bottom of each  
message a link could be added to the tracker item for that thread.   
Then going from email message (in your MUA or the web archives) to the  
tracker would be quick and easy.  Each email from hackers could have a  
link at the bottom to create a tracker item for it.  So patches  
threads get automatic tracker items.  Hackers threads can be added  
manually.


The tracker page for each message would include whatever metadata was  
needed.  For instance: has this tracker item been processed yet?  Is  
it a bug or a feature request or just a request for information or a  
fix to infrastructure?  Is there a reviewer for the patch?  Which fest  
does it belong to?  Or any other metadata you might want to add to the  
item.  Also on the page would be the thread that started the item.   
Initially it would show only subjects.  Clicking on a subject will  
show the body of the message inline with the thread. Clicking it again  
will collapse it again.  There will be a reply link for each message.   
If you reply via the web site it will simply send a message to the  
mail server just as it would if you had replies within your MUA.  That  
way there is no difference between 

[HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Tom Lane
[ Changing subject to draw people's attention ... ]

Teodor Sigaev [EMAIL PROTECTED] writes:
 RECHECK flag could be removed.

Hmm, that's slightly more radical than I was considering, but it would
simplify matters wouldn't it?  The only strong argument against it that
I can think of is that it'd break user-defined opclasses ... but it's
not like we don't change the API for GIST/GIN support functions from
time to time anyway.  Does anyone have any idea how many people are
building custom opclasses containing lossy operators?  Offhand I suspect
only the PostGIS project would be affected.

If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
syntax altogether, or leave it in but treat it as a no-op (probably
with a warning)?  The latter would make it a shade easier to load
existing dumps, but I'm inclined to think if we're going to break
something it'd be better to break it obviously than subtly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 PFC wrote:
 So, where to go from that ? I don't see a way to implement this without 
 a (backwards-compatible) change to the wire protocol, because the clients 
 will want to specify when a plan should be cached or not. Since the user  
 should not have to name each and every one of the statements they want to 
 use plan caching, I see the following choices :

 I don't understand the point here.  We already have cached plans: you
 send a Parse.  You can then Bind/Execute many times.
 Maybe what we need is support for this in libpq, so that PHP can use it?

We already have that, too ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Gregory Stark
Brendan Jurd [EMAIL PROTECTED] writes:

 In Trac, if I just want to loosely associate several tickets together
 I'd use *keywords*, e.g., put index am in the keywords list for
 several tickets, and then they'll show up prominently when I search
 for those terms.

As an aside, you've reminded me about another thing that bothers me about
Bugzilla and RT. In both cases they seem to put a lot of focus around the idea
of searching bugs. I don't really get why.

Maybe it makes sense if you plan to be like Mozilla and have 8-year-old bugs
that nobody ever sees let alone updates, but surely that isn't the goal.

In fact it seems like having the UI centred around searching pretty much
dooms you to that fate. Of course things will fall through the cracks if your
main UI only presents the things you decide to go look for.

I would think an interface which presents you with *all* unclosed bugs by
default, perhaps organized in some way (keywords, milestones, etc) would be
more conducive to getting attention to everything.

I'm sure you can do something like that in Bugzilla and RT but it sure doesn't
seem to be the way it's used in practice.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Joshua D. Drake
On Fri, 11 Apr 2008 18:46:18 +0100
Gregory Stark [EMAIL PROTECTED] wrote:

 I would think an interface which presents you with *all* unclosed
 bugs by default, perhaps organized in some way (keywords, milestones,
 etc) would be more conducive to getting attention to everything.
 
 I'm sure you can do something like that in Bugzilla and RT but it
 sure doesn't seem to be the way it's used in practice.

You can in RT (although I am not suggesting we use RT). You can also
do it in trac.

Joshua D. Drake
 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 In Trac, if I just want to loosely associate several tickets together
 I'd use *keywords*, e.g., put index am in the keywords list for
 several tickets, and then they'll show up prominently when I search
 for those terms.

Assuming you know what to search for, of course ...

 If I want something more structured I'd use a *milestone*.  I'd create
 an Index AM milestone and attach all the relevant tickets to it.
 Then I can easily pull up a report of all open tickets on the Index AM
 milestone (or all closed tickets, or all tickets regardless of status,
 or all tickets assigned to me, or all tickets not assigned to anyone
 yet, or ...)

Yeah, you can do all that in bugzilla too (Red Hat uses tracking bugs
to such an extent that I think they outnumber the plain bugs :-().
It still pretty much sucks for what I want, which is to easily see an
overview of what's in the commit-fest queue organized in some helpful
fashion.

In any case, this still sounds like forcing our problem to fit the tool.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Heikki Linnakangas

Tom Lane wrote:

Teodor Sigaev [EMAIL PROTECTED] writes:

RECHECK flag could be removed.


Hmm, that's slightly more radical than I was considering, but it would
simplify matters wouldn't it?  The only strong argument against it that
I can think of is that it'd break user-defined opclasses ... but it's
not like we don't change the API for GIST/GIN support functions from
time to time anyway.


Don't we need to change the GiST/GIN support function interface anyway, 
to be able to return the recheck flag?



If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
syntax altogether, or leave it in but treat it as a no-op (probably
with a warning)?  The latter would make it a shade easier to load
existing dumps, but I'm inclined to think if we're going to break
something it'd be better to break it obviously than subtly.


I agree with rather breaking it obviously than subtly.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Andrew Dunstan



Gregory Stark wrote:

As an aside, you've reminded me about another thing that bothers me about
Bugzilla and RT. In both cases they seem to put a lot of focus around the idea
of searching bugs. I don't really get why.

Maybe it makes sense if you plan to be like Mozilla and have 8-year-old bugs
that nobody ever sees let alone updates, but surely that isn't the goal.


  


No, there are several perfectly good reasons. It seems unlikely that you 
have never actually used bugzilla in earnest or you would not have made 
this comment.


First, there are reports that get marked not a bug. If somebody has 
found some behaviour that might be a bug, then being able to search for 
similar reports in the past and see the response is very valuable (and 
saves developers from having to give the same answer over and over)


Second, the system is used to track features as well as things that are 
strictly bugs. So, for example, you can find the response to a previous 
feature request.


A list of open feature requests in effect gives you a TODO list for nothing.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 PFC wrote:

  So, where to go from that ? I don't see a way to implement this without 
 a (backwards-compatible) change to the wire protocol, because the clients 
 will want to specify when a plan should be cached or not. Since the user  
 should not have to name each and every one of the statements they want to 
 use plan caching, I see the following choices :

 I don't understand the point here.  We already have cached plans: you
 send a Parse.  You can then Bind/Execute many times.

I think what he's referring to is persistently caching plans so that new
connections can use them. That makes a lot more sense if you have lots of
short-lived connections like a stock php server without persistent connections
turned on or a connection pooler. You can prepare queries but they only live
for a single web page so you don't get any benefit.

Personally I would like to see this, not primarily for the performance gains,
but for the possibility of managing when plans change -- ie, plan stability.

But there is resistance from other quarters about the reliability hit of
having the plan data structures in shared memory. A bug in one backend could
cause other backends to crash or corrupt their memory. The possibility exists
with other shared data structures but, arguably, plans are much more complex
data structures than PGPROC entries and buffers.

I still don't see why you would need a wire protocol change. You would just
have clients prepare plans normally and stash them in shared memory for other
backends in a hash table keyed by, well, something, perhaps the original query
text.

Then whenever you're asked to prepare a query you go check if someone else has
already done it for you and find an already generated plan in the shared
memory hash table.

The contention on the shared cache is likely to negate much of the planning
savings but I think it would still be a win. But what's really interesting to
me is then providing an interface to see and manipulate that cache. Then you
could see what plans other backends are using for queries, mark plans as being
acceptable or not, and even revoke users' permissions to execute queries which
aren't already present and marked as being acceptable.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Teodor Sigaev

I can think of is that it'd break user-defined opclasses ... but it's
not like we don't change the API for GIST/GIN support functions from
time to time anyway.  Does anyone have any idea how many people are
Hmm. The biggest breakage of interface to indexes was a removing 
pg_am.amconcurrent flag - there is one or two implementation of indexes which 
was depending of this flag. All our modules related to GIN or GiST  are in 
contrib already, new wildspeed will not work with =8.3 version anyway.



building custom opclasses containing lossy operators?  Offhand I suspect
only the PostGIS project would be affected.

Yes, I think so.


If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
syntax altogether, or leave it in but treat it as a no-op (probably
with a warning)?  
I think, it should be a error, but not a syntax error - hint should point to use 
new version of module. Loading dump from previous versions with opclass 
definitions is not good action anyway.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to add objetct size on \d+ verbose output

2008-04-11 Thread Dickson dos Santos Guedes
Hi all,

I don't know if I can post a patch here (I did subscribe to
psql-patches but without success).

Well, working in the latest revision from CVS I added a feature for
psql to the command \d+, now it shows the object size as like as
\l+ show the database size.

I will be pretty glad with yours sugestions and opnion.

Thanks.
-- 
[]s
Dickson S. Guedes
-
Projeto Colmeia - Curitiba - PR
(41) 3254-7130 ramal: 27
http://makeall.wordpress.com/
http://planeta.postgresql.org.br/
Index: describe.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/psql/describe.c,v
retrieving revision 1.166
diff -c -r1.166 describe.c
*** describe.c	30 Mar 2008 18:10:20 -	1.166
--- describe.c	11 Apr 2008 04:59:56 -
***
*** 1766,1775 
--- 1766,1781 
  		  gettext_noop(Table));
  
  	if (verbose)
+ 	{
  		appendPQExpBuffer(amp;buf,
  			  ,\n  pg_catalog.obj_description(c.oid, 'pg_class') as \%s\,
  		  gettext_noop(Description));
  
+ 		appendPQExpBuffer(amp;buf,
+ 			  ,\n  pg_catalog.pg_size_pretty(pg_catalog.pg_relation_size(c.oid)) as \%s\,
+ 	  	  gettext_noop(Size));
+ 	}
+ 	
  	appendPQExpBuffer(amp;buf,
  	  \nFROM pg_catalog.pg_class c
  	\n JOIN pg_catalog.pg_roles r ON r.oid = c.relowner
***
*** 1809,1816 
  	processSQLNamePattern(pset.db, amp;buf, pattern, true, false,
  		  n.nspname, c.relname, NULL,
  		  pg_catalog.pg_table_is_visible(c.oid));
! 
! 	appendPQExpBuffer(amp;buf, ORDER BY 1,2;);
  
  	res = PSQLexec(buf.data, false);
  	termPQExpBuffer(amp;buf);
--- 1815,1825 
  	processSQLNamePattern(pset.db, amp;buf, pattern, true, false,
  		  n.nspname, c.relname, NULL,
  		  pg_catalog.pg_table_is_visible(c.oid));
! 	
! 	if (verbose)
! 		appendPQExpBuffer(amp;buf, ORDER BY 1, pg_catalog.pg_relation_size(c.oid) DESC, 2;);
! 	else
! 		appendPQExpBuffer(amp;buf, ORDER BY 1,2;);
  
  	res = PSQLexec(buf.data, false);
  	termPQExpBuffer(amp;buf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Teodor Sigaev
Don't we need to change the GiST/GIN support function interface anyway, 
to be able to return the recheck flag?

Now we choose - save compatibility or not.

We can save flag RECHECK and introduce optional needRecheck argument for 
consistent function and new opclass can use new interface, old ones will work 
with RECHECK. Or we remove RECHECK and force opclasses to use new interface.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add objetct size on \d+ verbose output

2008-04-11 Thread Alvaro Herrera
Dickson dos Santos Guedes escribió:

 Well, working in the latest revision from CVS I added a feature for
 psql to the command \d+, now it shows the object size as like as
 \l+ show the database size.
 

Added to May commitfest page, thanks.



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Oleg Bartunov

Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.

Oleg

On Fri, 11 Apr 2008, Teodor Sigaev wrote:


Teodor, do you have any thoughts about exactly how you'd fix @@@ ?
I suppose that the recheck-need is not really a property of specific
tuples, but of a particular query, for that case.  Where would you
want to detect that?


tsquery may include restriction by weight of search terms: 'sea  port:A'. 
GIN index doesn't store information about weights, so the only difference 
between @@ and @@@ is that @@@ is marked with RECHECK flag. I think, the 
better way is set flag about required recheck by looking value from index, 
not for tsquery. It gives to us more flexibility.


So, I planned to add pointer to bool to consistent method, so signature will 
be
bool consistent( bool check[], StrategyNumber n, Datum query, bool 
*needRecheck)


Returning value of needRecheck should be ignored for operation not marked by 
RECHECK flag in opclass. needRecheck should be initialized to true before 
call of consistent method to keep compatibility with old opclasses.


To define, is recheck needed or not, the better way is to check actually 
needed values. For example, let tsquery is equal to
'foo | bar | qq:A' and tsvetor = 'foo:1,2,3 asdasdasd:4'. Obviously recheck 
is not needed. So patch is close to trivial:


*** tsginidx.c.orig 2008-04-11 17:08:37.0 +0400
--- tsginidx.c  2008-04-11 17:18:45.0 +0400
***
*** 109,114 
--- 109,115 
 {
   QueryItem  *frst;
   bool   *mapped_check;
+   bool   *needRecheck;
 } GinChkVal;

 static bool
***
*** 116,121 
--- 117,125 
 {
   GinChkVal  *gcv = (GinChkVal *) checkval;

+   if ( val-weight )
+   *(gcv-needRecheck) = true;
+
   return gcv-mapped_check[((QueryItem *) val) - gcv-frst];
 }

***
*** 144,149 
--- 148,155 

   gcv.frst = item = GETQUERY(query);
   gcv.mapped_check = (bool *) palloc(sizeof(bool) * 
query-size);

+   gcv.needRecheck = PG_GETARG_POINTER(3);
+   *(gcv.needRecheck) = false;

   for (i = 0; i  query-size; i++)
   if (item[i].type == QI_VAL)








Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Commit fest status

2008-04-11 Thread Tom Lane
What's left on Bruce's patch queue page is:

* Finishing out Heikki's patch to allow runtime determination of the
need to recheck an index condition.  What's committed so far doesn't
yet have any actual use :-(.  Although I intend to keep working on
that, it's clearly new development and hence not commit-fest material.

* Design discussions about dead space map, free space map, etc.
I think that we have pretty much converged on a consensus that the
way to store these maps is to add separate subsidiary file(s) for
each relation (forks, for lack of a better name).  And that really
seems to be the only thing we need to decide now --- there's not much
else to talk about until we have some prototype code to experiment
with.

* That thread about real procedures.  I'm not seeing that we need
any further discussion now about that, either.  The consensus in the
thread seemed to be that having a PL that could execute outside
transactions would be good, but nobody was excited about much else
that was suggested.

In short, I think it's time to declare our first commit fest done.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Guillaume Smet
On Fri, Apr 11, 2008 at 7:41 PM, Tom Lane [EMAIL PROTECTED] wrote:
  Offhand I suspect
  only the PostGIS project would be affected.

Just wanted to point out that I personnally use the capability to
remove the RECHECK of PostGIS opclass (I define a similar opclass
without the recheck) when I enforce the SRID in the application: the
overhead of rechecking the rows returned by the index scan is really
noticeable (for nothing if your SRID is the same in all your
application and data - that's what I read some time ago, hope it's
still true).

I don't know how you think PostGIS should fix their opclass after
RECHECK removal. But I'd like to have a way to keep the ability to
remove the RECHECK on my own if it's possible so I hope it won't be
hardcoded in C code somewhere.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Now we choose - save compatibility or not.

 We can save flag RECHECK and introduce optional needRecheck argument for 
 consistent function and new opclass can use new interface, old ones will work
 with RECHECK. Or we remove RECHECK and force opclasses to use new interface.

Yeah, that's what it boils down to.

I'm leaning towards removing RECHECK because it'll allow simplification
of the core code, and I doubt there are enough outside opclasses that're
using lossy operators for the compatibility loss to be a big deal.
We've certainly forced bigger changes than that in the past.

I seem to recall that you had some plans for other incompatible changes
in the call conventions for GIST/GIN support functions, too.  If
anything like that is going to happen for 8.4, then outside opclasses
are going to need updates anyway, and forcing this one on them too would
hardly be much of a burden.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 Slightly offtopic. How to get benefit on tuple level ? For example,
 we mark GiST tsearch index as lossy, while for not very big documents it's
 actually exact and we could save a lot not rechecking them.

Won't that just fall out of this?  Assuming the consistent() function
knows when the match is exact, it can set the flag properly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Separate psql commands from arguments

2008-04-11 Thread Decibel!

On Apr 10, 2008, at 4:48 PM, Gregory Stark wrote:

Well I feel like storing a query and resending it later is something
predictable which will work reliably. Storing a psql input line and
reinterpreting it later is surely going to cause weird things to  
happen.


Just for some examples off the top of my head, what happens if I  
define an
alias \foo which consists of \foo and call it? What happens if  
I have
mutually recursive aliases? What happens if I define \foo to run  
\ followed
by its first argument, and I pass it foo? What happens if I pass  
it unalias

foo?


As Bernd said, I see this as simple search and replace, and then  
stick it in the command buffer. If you define an alias that calls  
itself, you could end up with a stack overflow, same as with server  
functions.


What happens if you press C-c during an alias, does it keep running  
subsequent
commands? What if the editor returns an error after a \e command?  
What about

if a \i command doesn't find the file?


You throw an error. In shell, you can use  to control if you keep  
going or not after that; perhaps we should have that.


Basically it sounds like you're treating psql as if it was a well  
defined
language with well defined syntax and semantics. And I don't think  
it is. It's
just one big if-else-if block with lots of strcmps. There's no  
infrastructure

to parse or manage a stack of calls to functions.


So perhaps we should change that. Don't get me wrong, psql is the  
most powerful command-line database I've ever seen, but it still  
irritates me that it's not more shell-like in nature. In particular,  
the inability to do things like condition processing, or save the  
results of an SQL query into a psql variable are very annoying. If  
that (or aliases) means psql needs to be more than a big IF-THEN-ELSE  
then I think that's part of what we should do.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Tom Lane
Guillaume Smet [EMAIL PROTECTED] writes:
 Just wanted to point out that I personnally use the capability to
 remove the RECHECK of PostGIS opclass (I define a similar opclass
 without the recheck) when I enforce the SRID in the application:

To be blunt, that seems like a really bad idea, and I have not the
slightest hesitation about breaking your ability to do it.  How
do you know that the recheck-need corresponds to what you are testing
on the application side?

If there's actually some safe, consistent use for such behavior
then I think you need to lobby the PostGIS project to provide
access to it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Andrew Sullivan
On Fri, Apr 11, 2008 at 06:46:18PM +0100, Gregory Stark wrote:

 As an aside, you've reminded me about another thing that bothers me about
 Bugzilla and RT. In both cases they seem to put a lot of focus around the
 idea of searching bugs. I don't really get why.

To be fair to RT, it's really designed as a general-purpose trouble-ticket
system.  If you've ever worked a help desk, you'll have no trouble knowing
why searching is a valuable function.

But yes, you can (with some pain, like everything else in RT) completely
rejigger the access screens to eliminate this.

A


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Decibel!

On Apr 11, 2008, at 12:54 PM, Tom Lane wrote:

Brendan Jurd [EMAIL PROTECTED] writes:

In Trac, if I just want to loosely associate several tickets together
I'd use *keywords*, e.g., put index am in the keywords list for
several tickets, and then they'll show up prominently when I search
for those terms.


Assuming you know what to search for, of course ...

If I want something more structured I'd use a *milestone*.  I'd  
create

an Index AM milestone and attach all the relevant tickets to it.
Then I can easily pull up a report of all open tickets on the  
Index AM
milestone (or all closed tickets, or all tickets regardless of  
status,

or all tickets assigned to me, or all tickets not assigned to anyone
yet, or ...)


Yeah, you can do all that in bugzilla too (Red Hat uses tracking bugs
to such an extent that I think they outnumber the plain bugs :-().
It still pretty much sucks for what I want, which is to easily see an
overview of what's in the commit-fest queue organized in some helpful
fashion.


Mozilla's bugzilla uses milestones to track what release something is  
scheduled for... I'm thinking the same mechanism could be used for  
commitfests (and releases).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] question on how to correctly communicate with external library functions which return malloc()'ed strings

2008-04-11 Thread Vladimir Volovich
MvO == Martijn van Oosterhout writes:

  it worked fine with postgresql 8.2; with 8.3 it started segfaulting,
  and it appeared that the reason is because in postgresql 8.3, the
  free is a macro defined in snowball/header.h: #define free(a)
  pfree(a)

 MvO It does seem wrong. Do you include that header file explicitly?
 MvO Because it shouldn't be necessary.

i needed some prototypes from snowball in the same source file, so i
included snowball/header.h; it appears that it is sufficient to use
  #include snowball/libstemmer/api.h
instead of
  #include snowball/header.h

but please remove the #define free from snowball/header.h because, as
you said, it's wrong.

Best,
v.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Oleg Bartunov

On Fri, 11 Apr 2008, Tom Lane wrote:


Oleg Bartunov [EMAIL PROTECTED] writes:

Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.


Won't that just fall out of this?  Assuming the consistent() function
knows when the match is exact, it can set the flag properly.


Ah, yes. Looks like a new life for GiST tsearch index.



regards, tom lane



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest status

2008-04-11 Thread Bruce Momjian
Tom Lane wrote:
 What's left on Bruce's patch queue page is:
 
 * Finishing out Heikki's patch to allow runtime determination of the
 need to recheck an index condition.  What's committed so far doesn't
 yet have any actual use :-(.  Although I intend to keep working on
 that, it's clearly new development and hence not commit-fest material.
 
 * Design discussions about dead space map, free space map, etc.
 I think that we have pretty much converged on a consensus that the
 way to store these maps is to add separate subsidiary file(s) for
 each relation (forks, for lack of a better name).  And that really
 seems to be the only thing we need to decide now --- there's not much
 else to talk about until we have some prototype code to experiment
 with.
 
 * That thread about real procedures.  I'm not seeing that we need
 any further discussion now about that, either.  The consensus in the
 thread seemed to be that having a PL that could execute outside
 transactions would be good, but nobody was excited about much else
 that was suggested.
 
 In short, I think it's time to declare our first commit fest done.

OK, todo updated, but what about the Maintaining cluster order on
insert idea?

http://momjian.us/cgi-bin/pgpatches

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Guillaume Smet
On Fri, Apr 11, 2008 at 9:02 PM, Tom Lane [EMAIL PROTECTED] wrote:
  To be blunt, that seems like a really bad idea, and I have not the
  slightest hesitation about breaking your ability to do it.  How
  do you know that the recheck-need corresponds to what you are testing
  on the application side?

From what I read when I did that a few months ago, the recheck was
added to provide SRID checking.
If you don't have it, you don't have the SRID mismatch error when
SRIDs don't match.

  If there's actually some safe, consistent use for such behavior
  then I think you need to lobby the PostGIS project to provide
  access to it.

In the general case, there isn't. If you enforce the SRID (using a
wrapper/constraints/whatever), there is.

After some googling, I finally found the post of Mark Cave-Ayland on
postgis-users which made me take this decision:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01206.html

Don't know if there is a better way to do it in PostGIS itself but the
ability to take this decision for a specific database (or even column)
is really convenient.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest status

2008-04-11 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 In short, I think it's time to declare our first commit fest done.

 OK, todo updated, but what about the Maintaining cluster order on
 insert idea?
   http://momjian.us/cgi-bin/pgpatches

The last item I see in the thread is some performance tests that
make it look not worthwhile.  There's no discussion needed, unless
someone refutes that test or improves the code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest status

2008-04-11 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  In short, I think it's time to declare our first commit fest done.
 
  OK, todo updated, but what about the Maintaining cluster order on
  insert idea?
  http://momjian.us/cgi-bin/pgpatches
 
 The last item I see in the thread is some performance tests that
 make it look not worthwhile.  There's no discussion needed, unless
 someone refutes that test or improves the code.

OK, so we delete it --- fine.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove lossy-operator RECHECK flag?

2008-04-11 Thread Teodor Sigaev
 I'm leaning towards removing RECHECK because it'll allow simplification
I vote to do it.

 I seem to recall that you had some plans for other incompatible changes
 in the call conventions for GIST/GIN support functions, too.  If
Right now we suggest only new feature which just extent interface. Gregory
suggested to split compare method to two methods and I'm intending to do
it.

-- 
Teodor SigaevEmail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: real procedures again (8.4)

2008-04-11 Thread Bruce Momjian

Added to TODO:

* Allow functions to control the transaction state

  http://archives.postgresql.org/pgsql-hackers/2007-10/msg01375.php


---

Pavel Stehule wrote:
 Hello,
 
 I found lot of discus about this topic.
 
 http://www.postgresql.org/docs/faqs.TODO.html
 http://archives.postgresql.org/pgsql-hackers/2003-08/msg00501.php
 http://archives.postgresql.org/pgsql-hackers/2004-09/msg00734.php
 http://archives.postgresql.org/pgsql-hackers/2004-08/msg00872.php
 http://archives.postgresql.org/pgsql-hackers/2004-09/msg00702.php
 
 There is one result - OUT params for functions. I propose start with
 simple goals that we can enhance in future.
 
 First goal: Procedures support for better conformance with ANSI SQL:
 
 * procedure returns any only through OUT, INOUT params,
 * procedure has own executor, that allows byref params (and own
 transaction management in future),
 * procedure can be overloaded,
 * procedure can not returns recordset or multi recordset,
 * procedure doesn't support default parameters,
 * SQL statement CALL allows only expression (this proposal doesn't
 need session variables) for older environments
 * new SPI_exec_procedures API (allows binding to host variables) and
 some similar in libpq, that allow CALL implementation in pgsql and
 others.
 * new internal exec_exec_proc (allow ref on datum variable) used in
 plpgsql statement CALL.
 * new V2 calling convention (maybe based on SQL/CLI)
 * no changes in current functions support
 
 Later:
 * procedure can manages transactions,
 * procedure can returns recordset or multi recordset,
 * procedure allows default parameters,
 * CALL statement allows session variables
 * no changes in current functions support
 
 Why new calling convention? I would to support byref variables and
 then I have to carry memory context info ... and maybe some others
 
 Nice a weekend
 
 Pavel Stehule
 
 p.s.
 
 Why procedures? New parts of ANSI SQL use it, and what is worse, they
 use methods:
 http://www.wiscorp.com/H2-2005-350-WG4-wlg005-Part-7-History-2nd-Edition-Text-for-Working-Draft.pdf
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] question on how to correctly communicate with external library functions which return malloc()'ed strings

2008-04-11 Thread Tom Lane
Vladimir Volovich [EMAIL PROTECTED] writes:
 but please remove the #define free from snowball/header.h because, as
 you said, it's wrong.

It's not wrong and it won't be removed.  Please note the header comment
in that file:

 * NOTE: this file should not be included into any non-snowball sources!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] question on how to correctly communicate with external library functions which return malloc()'ed strings

2008-04-11 Thread Vladimir Volovich
TL == Tom Lane writes:

 TL It's not wrong and it won't be removed.  Please note the header
 TL comment in that file:

 TL  * NOTE: this file should not be included into any non-snowball
 TL sources!

ok, i'll just include snowball/libstemmer/api.h or snowball/libstemmer/header.h 
instead.

in postgresql 8.2 there was no such #define free in snowball/header.h,
and a change in 8.3 introduced breakage; i should have looked in the new
snowball/header.h comment prior to complaining.

thanks.

Best,
v.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot memory management

2008-04-11 Thread Alvaro Herrera
Alvaro Herrera wrote:

 To implement this I propose keeping track of four additional pointers in
 SnapshotData: next and prev in the registered list, and next and
 prev in the active stack.

FWIW this does not work as proposed because one snapshot can be stored
as active more than once simultaneously.

Back to the drawing board on memory management -- so far I have merged
the two snapshot patches I had (RegisterSnapshot and ActiveSnapshot),
and the result looks reasonably good.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest status

2008-04-11 Thread Chris Browne
[EMAIL PROTECTED] (Tom Lane) writes:
 In short, I think it's time to declare our first commit fest done.

Congratulations!  

As a pure observer in the matter, it has clearly been a somewhat
painful process, which must be tempered by the consideration that what
was being reviewed was pretty much a year's worth of work.  I think
there's reason to hope that later iterations should be a bit easier
from that perspective alone.  And hopefully the learning curve means
that things have been learned to ease future pain :-).

Thanks all that have been working on it!
-- 
let name=cbbrowne and tld=linuxdatabases.info in String.concat @ 
[name;tld];;
http://linuxfinances.info/info/spiritual.html
Rules of the Evil Overlord #130.  All members of my Legions of Terror
will  have professionally  tailored  uniforms. If  the  hero knocks  a
soldier unconscious and steals the uniform, the poor fit will give him
away. http://www.eviloverlord.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Kenneth Marshall
We use RT here for our trouble ticket system and the dashboard
can easily be configured to display tickets based on any search
criteria and you can have multiple views on the same screen.
The search functionality can be viewed as the tool for configuring
your views into the system, for whatever its purpose may be. It
is easy to organize the views based on keywords, milestones, or
anything else. It really is very flexible and its E-mail interface
is very nice as well.

Regards,
Ken Marshall

On Fri, Apr 11, 2008 at 06:46:18PM +0100, Gregory Stark wrote:
 Brendan Jurd [EMAIL PROTECTED] writes:
 
  In Trac, if I just want to loosely associate several tickets together
  I'd use *keywords*, e.g., put index am in the keywords list for
  several tickets, and then they'll show up prominently when I search
  for those terms.
 
 As an aside, you've reminded me about another thing that bothers me about
 Bugzilla and RT. In both cases they seem to put a lot of focus around the idea
 of searching bugs. I don't really get why.
 
 Maybe it makes sense if you plan to be like Mozilla and have 8-year-old bugs
 that nobody ever sees let alone updates, but surely that isn't the goal.
 
 In fact it seems like having the UI centred around searching pretty much
 dooms you to that fate. Of course things will fall through the cracks if your
 main UI only presents the things you decide to go look for.
 
 I would think an interface which presents you with *all* unclosed bugs by
 default, perhaps organized in some way (keywords, milestones, etc) would be
 more conducive to getting attention to everything.
 
 I'm sure you can do something like that in Bugzilla and RT but it sure doesn't
 seem to be the way it's used in practice.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
   Ask me about EnterpriseDB's RemoteDBA services!
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread PFC



I think what he's referring to is persistently caching plans so that new
connections can use them. That makes a lot more sense if you have lots of
short-lived connections like a stock php server without persistent  
connections
turned on or a connection pooler. You can prepare queries but they only  
live

for a single web page so you don't get any benefit.


Let me explain a little further.

	Persistent database connections are the way to go for web applications,  
because the connection is only going to get used for a few queries, and  
the time needed to start the postgres process and establish the connection  
is often significant compared to the time used for the actual queries.  
Connection pooling can also be used, you get the idea.


	So, using persistent database connections, it makes sense to use prepared  
statements to speed up execution of simple queries, like those returning a  
few rows with a few joins and no complicated WHERE clauses, which is  
actually most of the queries on your average website. As shown in my  
previous message, the CPU time spent planning the query can be as much or  
even a lot more than CPU time spent actually executing the query.


	But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not, if it contains  
already prepared statements or not, you'd have to maintain a list of those  
statements (named) for every query in your application, and when someone  
changes a query, it's a mess, not to mention queries generated by the ORM  
like Rails etc.


The idea in this proof of concept was :

	Wouldn't it be nice if Postgres could just say Hey, I already planned  
that query, I'll reuse that plan.
	And it is very easy to recognize a query we've seen before, since  
$-params takes the parameters out of the equation, and eliminates parsing  
time and string quoting hell.


	Storing the cached plans as prepared statements in the connection-local  
hashtable makes sense : it doesn't use that much memory anyway, and there  
are no locking and contention problems. Just like PREPARE and EXECUTE.


Personally I would like to see this, not primarily for the performance  
gains,
but for the possibility of managing when plans change -- ie, plan  
stability.


Unfortunately, this isn't compatible with a non-shared memory 
approach...


But there is resistance from other quarters about the reliability hit of
having the plan data structures in shared memory.


I agree.
Hence the idea to put them in non-shared memory, local to a process.
	Perfectly useless when using non-persistent connections, but very  
powerful when using persistent connections.



I still don't see why you would need a wire protocol change.


	Because I'd think that sometimes the client will not want to use a cached  
plan, when the query is rarely used (no need to waste memory to cache the  
plan), or it is complex and needs to be replanned according to parameter  
values every time.
	Sure, the client could use the oldskool send query as text with  
parameters inside but that's back to string escaping hell, and it's ugly.

It would be nicer to have a bool cache_plan.


You would just
have clients prepare plans normally and stash them in shared memory for  
other
backends in a hash table keyed by, well, something, perhaps the original  
query

text.


	Query text seems to be the simplest, better not ask the user to come up  
with distinct names when the query text will be a perfect key. Besides,  
hand-generated names might turn out not to be so distinct after all...


Then whenever you're asked to prepare a query you go check if someone  
else has

already done it for you and find an already generated plan in the shared
memory hash table.
The contention on the shared cache is likely to negate much of the  
planning
savings but I think it would still be a win. But what's really  
interesting to
me is then providing an interface to see and manipulate that cache. Then  
you
could see what plans other backends are using for queries, mark plans as  
being
acceptable or not, and even revoke users' permissions to execute queries  
which

aren't already present and marked as being acceptable.


	If it can be made to work with a shared cache, why not, but that would be  
more complex. You'd also have to deal with permissions, different users  
with different privileges, etc. But local would probably be simplest (and  
faster).


	Also, there will be problems with the schema search path. Perhaps a query  
should be required to specify the fully qualified table names  
(schema.table) for all tables in order to be cacheable.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Tom Lane
PFC [EMAIL PROTECTED] writes:
   And it is very easy to recognize a query we've seen before,

It's not so easy as all that.  Consider search_path.  Consider temp
tables.

The real problem here is entirely on the client side:

   But, using prepared statements with persistent connections is messy,  
 because you never know if the connection is new or not,

If you were to fix *that* then both this problem and others (such as
setting up desired SET-parameter values) would go away.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-11 Thread Gregory Stark
PFC [EMAIL PROTECTED] writes:

   But, using prepared statements with persistent connections is messy,
 because you never know if the connection is new or not, if it contains  
 already
 prepared statements or not, you'd have to maintain a list of those  statements
 (named) for every query in your application, and when someone  changes a 
 query,
 it's a mess, not to mention queries generated by the ORM  like Rails etc.

Well if you're caching per-connection then it doesn't really matter whether
you do it on the client side or the server side, it's pretty much exactly the
same problem.

Unsurprisingly most drivers do precisely what you're describing. In Perl DBI
for example you just change $dbh-prepare() into $dbh-prepare_cached()
and it does exactly what you want. I would expect the PHP drivers to have
something equivalent.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index AM change proposals, redux

2008-04-11 Thread Ron Mayer

Heikki Linnakangas wrote:

* GIT (Grouped Index Tuple) indexes, which achieve index space savings
in btrees by having a single index tuple represent multiple heap tuples
[...]
Another issue is that we'd need to check how much of the use-case for
GIT has been taken over by HOT.


There is, however, a ton of overlap with index-only scans, and the 
possibility to return keys from indexes, as you pointed out.


One use case that I think GIT would help a lot with are my
large address tables that are clustered by zip-code but
often queried by State, City, County, School District,
Police Beat, etc.

I imagine a GIT index on state would just occupy
a couple pages at most regardless of how large the
table gets.   And likewise, even an index on City
would be orders of magnitude smaller than the existing
ones; since all records for any given city are all
on the same few disk pages.

Or am I misunderstanding how GIT works.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest queue

2008-04-11 Thread Brendan Jurd
On Sat, Apr 12, 2008 at 3:46 AM, Gregory Stark [EMAIL PROTECTED] wrote:
  As an aside, you've reminded me about another thing that bothers me about
  Bugzilla and RT. In both cases they seem to put a lot of focus around the 
 idea
  of searching bugs. I don't really get why.


Er, because pretty much everybody wants the ability to easily consult
the project's development history?

In a typical bugzilla scenario, the majority of users are going to be
accessing the tracker either to file a bug or request a feature.
Search must be front and centre for this to work effectively, because
you want those users to search for similar bugs before creating a new
one.

Trac's UI is less focussed on search.  The search box just sits up
there in the upper right corner in case you want to use it.

  I would think an interface which presents you with *all* unclosed bugs by
  default, perhaps organized in some way (keywords, milestones, etc) would be
  more conducive to getting attention to everything.

  I'm sure you can do something like that in Bugzilla and RT but it sure 
 doesn't
  seem to be the way it's used in practice.

Yes, of course all reasonable trackers also have a way to pull up
complete listings of open items.

I think you've been thrown off the scent because bugzilla's primary UI
is geared towards the submitter's usage pattern, not the reviewer's.
It doesn't mean that the reviewer is left out in the cold.  It does
mean that, as a reviewer, you have to either place an extra click or
two to bring up your favourite listing, or (!) make a bookmark.

For example, in Trac you click on View Tickets and then Active
Tickets.  It's a two click operation.  It's not like it's obfuscated.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers