Re: [HACKERS] question about meaning of character varying without length

2009-06-16 Thread Stefan Kaltenbrunner

Konstantin Izmailov wrote:

Here you go:
from	Mail Delivery Subsystem >

to  pgf...@gmail.com 
dateMon, Jun 15, 2009 at 9:16 PM
subject Delivery Status Notification (Failure)


hide details 9:16 PM (31 minutes ago)


Reply

Follow up message

This is an automatically generated Delivery Status Notification

Delivery to the following recipient failed permanently:

pgsql-gene...@postgresql.com 


postgresql.com != postgresql.org...


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partial vacuum versus pg_class.reltuples

2009-06-16 Thread Heikki Linnakangas

(back from vacation)

Tom Lane wrote:

I wrote:

Another interesting question is why successive vacuums aren't causing
the index reltuples counts to go to zero.  Shouldn't a partial vacuum
result in *all* pages of the relation being marked as not needing to
be examined by the next vacuum?


I figured out the reason for that: the first 32 pages of the table are
always scanned, even if the whole thing is frozen, because of the
SKIP_PAGES_THRESHOLD logic.  We could change that behavior by
initializing all_visible_streak to SKIP_PAGES_THRESHOLD instead of zero.
But if we did so then having even just page zero be skippable would mean
that we clear scanned_all and thus fail to update reltuples, which is
probably not a good thing.


Right, that's exactly why I wrote it like that. I also thought about 
scanning the (beginning of the) visibility map first to see if there's 
big enough gaps in there to warrant skipping pages, but went with the 
current approach because it's so much simpler.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

Hi,

On 06/12/2009 07:15 AM, Robert Haas wrote:

If you don't like the syntax, please argue about that on the "generic
explain options v2" thread.  Let's try to use this thread to discuss
the output format, about which I spent a good deal of time agonizing.
I spent some time playing around with the explain output with various 
queries. Beside the already raised mild dislike (from Peter Eisentraut I 
think) of Upper-Case "-" seperated tag-names I found mainly one gripe:


1710.98
1710.98
72398
4
136.595
136.595
72398
1

This is a bit inconsistent. i.e. for the row estimate you use 
 and for  you dont use the "Plan-" Prefix.
While for the 'analyze' generated variables you use the 'Actual-' prefix 
consistently.


One approach would be to have two nodes like:

...
...


...
...


This would probably make it easier to write a future proof parser and it 
also seems semantically sensible.



As an aside issue it would perhaps be nice (thinking of an 
index-suggestion tool) to make it possible for having seperate estimates 
on  an  - In order not to change the format later 
that perhaps has to be considered here.
Perhaps the current structure + some additional tags is also the best 
here - I just noticed it being a potential issue.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> As I look at this, another problem is that it seems to me that you're
> assuming that VARDATA_ANY() will return an aligned pointer, which
> isn't necessarily the case (see src/include/postgres.h).

I believe you need to look at it more carefully.  I don't think it's
making any such assumption.  Specifically, it has three loops; an "until
we're aligned" loop, then a "while we're aligned", and a "when we've
done all the aligned we could do".  

On the flip side, I am curious as to if the arguments to a stored
procedure are always aligned or not.  Never had a case to care before,
but if palloc() is always going to return an aligned chunk of memory
(per MemSetAligned in c.h) it makes me wonder.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Synch Rep: communication between backends and walsender

2009-06-16 Thread Fujii Masao
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00448.php

One of the major complaints about the current synch rep patch is that
signals are used for communication between backends and walsender.
On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
system call), which would increase the performance overhead of
replication.

So I'd like to propose using the UDP socket and the semaphores
instead of signals for communication from backends to walsender
and vice versa, respectively.

The UDP socket is used for backends to request walsender to send
WAL records. Semaphores cannot be used for this purpose because
walsender must wait for the request from backends and the reply from
the standby server concurrently. Some UDP packets might get lost,
but that doesn't matter because the important data is communicated
via the shared memory and walsender wakes up periodically without
receiving that request. This UDP socket can be created like that for
statistics collector.

On the other hand, the semaphores are used for backends to wait
for the reply from walsender. The backend registers its semaphore
on the shared memory before sleeping, then walsender wakes it up
by using that semaphore.

Comments? Do you have another better approach?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 6:30 AM, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> As I look at this, another problem is that it seems to me that you're
>> assuming that VARDATA_ANY() will return an aligned pointer, which
>> isn't necessarily the case (see src/include/postgres.h).
>
> I believe you need to look at it more carefully.  I don't think it's
> making any such assumption.  Specifically, it has three loops; an "until
> we're aligned" loop, then a "while we're aligned", and a "when we've
> done all the aligned we could do".

I see that... but I don't think the test in the first loop is correct.
 It's based on the value of i % 4, but I'm not convinced that you know
anything about the alignment at the point where i == 0.

I might be all wet here, I haven't looked at this area of the code in detail.

> On the flip side, I am curious as to if the arguments to a stored
> procedure are always aligned or not.  Never had a case to care before,
> but if palloc() is always going to return an aligned chunk of memory
> (per MemSetAligned in c.h) it makes me wonder.

Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
varlena header...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund wrote:
> 1710.98
> 1710.98
> 72398
> 4
> 136.595
> 136.595
> 72398
> 1

XML's not really my thing currently but it sure seems strange to me to
have *everything* be a separate tag like this. Doesn't XML do
attributes too? I would have thought to use child tags like this only
for things that have some further structure.

I would have expected something like:







 
 
 



This would allow something like a graphical explain plan to still make
sense of a plan even if it finds a node it doesn't recognize. It would
still know generally what to do with a "scan" node or a "join" node
even if it is a new type of scan or join.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas wrote:
> I see that... but I don't think the test in the first loop is correct.
>  It's based on the value of i % 4, but I'm not convinced that you know
> anything about the alignment at the point where i == 0.

That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer even if it's pretty simple.

Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
think, bogus. There would be no alignment guarantee on that array.
Personally I'm find with 0x20202020 with a comment explaining what it
is.







-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> I see that... but I don't think the test in the first loop is correct.
>  It's based on the value of i % 4, but I'm not convinced that you know
> anything about the alignment at the point where i == 0.

Ah, you may be half right there (see below).  It does appear to be
assuming that char *s (or s[i == 0]) is aligned, which isn't a
guarentee (in fact, it might never be right..).  If having it actually
aligned is an important bit (as opposed to just doing the comparisons in
larger but possibly unaligned blocks) then that'd make a difference.

If the code as-is showed performance improvment even when it's working
on less-than-aligned blocks, I'd be curious what would happen if it was
actually aligned.  Unfortunately, the results of such would probably be
heavily architecture dependent..

> > On the flip side, I am curious as to if the arguments to a stored
> > procedure are always aligned or not.  Never had a case to care before,
> > but if palloc() is always going to return an aligned chunk of memory
> > (per MemSetAligned in c.h) it makes me wonder.
> 
> Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
> varlena header...

Right, but I'm talking about the base of the argument itself, not
the start of the data.  If every variable length argument to a stored
procedure is palloc()'d independently, and palloc()'s always return
aligned memory, we'd at least know that the base of the argument is
aligned and could figure out the header size and then do the
comparisons accordingly.  This would also mean, of course, that we'd
almost(?) never have s[i == 0] on an aligned boundary due to the
header.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas wrote:
>
>> On the flip side, I am curious as to if the arguments to a stored
>> procedure are always aligned or not.  Never had a case to care before,
>> but if palloc() is always going to return an aligned chunk of memory
>> (per MemSetAligned in c.h) it makes me wonder.
>
> Well, if it's char(n) for n <~ 126, it's going to have a 1-byte
> varlena header...

There are two points here that kind of cancel each other out :)

If the data is in fact returned from a palloc because it was the
result of some other function call then it will almost certainly have
a 4-byte header and that'll be aligned. There are some exceptions
where functions are just returning copies and copy the whole datum
though, but the point is we normally don't toast or pack varlenas
unless they're being stored on disk.

However that's all irrelevant because there's no guarantee the data
being passed will have been palloced at all. You could get a pointer
to data in a shared buffer. Ie, data on disk. That will be aligned
based on how tuples are packed on disk which is precisely where we go
out of our way to avoid wasting space on alignment.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Stephen Frost
* Greg Stark (gsst...@mit.edu) wrote:
> There are two points here that kind of cancel each other out :)

Thanks for the insight. :)

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 1:41 PM, Stephen Frost wrote:
>
> Ah, you may be half right there (see below).  It does appear to be
> assuming that char *s (or s[i == 0]) is aligned, which isn't a
> guarentee (in fact, it might never be right..).  If having it actually
> aligned is an important bit (as opposed to just doing the comparisons in
> larger but possibly unaligned blocks) then that'd make a difference.

On some architectures like intel accessing unaligned ints is just
slow. On others (Alpha and PPC iirc?) it is an immediate bus error.

I would actually be more curious whether we can do th e comparison
without having to pre-scan for the spaces at the end than trying to
opimize that prescan.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

On 06/16/2009 02:14 PM, Greg Stark wrote:

On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund  wrote:

1710.98
1710.98
72398
4
136.595
136.595
72398
1


XML's not really my thing currently but it sure seems strange to me to
have *everything* be a separate tag like this. Doesn't XML do
attributes too? I would have thought to use child tags like this only
for things that have some further structure.



I would have expected something like:


 
 
 
 
 
  
  
  



This would allow something like a graphical explain plan to still make
sense of a plan even if it finds a node it doesn't recognize. It would
still know generally what to do with a "scan" node or a "join" node
even if it is a new type of scan or join.
While that also looks sensible the more structured variant makes it 
easier to integrate additional stats which may not easily be pressed in 
the 'attribute' format. As a fastly contrived example you could have io 
statistics over time like:


   ...
   ...
   ...


Something like that would be harder with your variant.

Structuring it in tags like suggested above:

...
...


...
...


Enables displaying unknown 'scalar' values just like your variant and 
also allows more structured values.


It would be interesting to get somebody having used the old explain in 
an automated fashion into this discussion...


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 8:38 AM, Greg Stark wrote:
> On Tue, Jun 16, 2009 at 1:03 PM, Robert Haas wrote:
>> I see that... but I don't think the test in the first loop is correct.
>>  It's based on the value of i % 4, but I'm not convinced that you know
>> anything about the alignment at the point where i == 0.
>
> That's correct. To check the alignment you would have to look at the
> actual pointer. I would suggest using the existing macros to handle
> alignment. Hm, though the only one I see offhand which is relevant is
> the moderately silly PointerIsAligned(). Still it would make the code
> clearer even if it's pretty simple.
>
> Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
> think, bogus. There would be no alignment guarantee on that array.
> Personally I'm find with 0x20202020 with a comment explaining what it
> is.

Ooh, good point.  I still don't like the 0x20 thing, but using uint32
instead of int or long is the main point, unless we support any
platforms where 0x20 != ' '.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Andrew Dunstan



Robert Haas wrote:

Ooh, good point.  I still don't like the 0x20 thing, but using uint32
instead of int or long is the main point, unless we support any
platforms where 0x20 != ' '.


  


All our server encodings are strictly ASCII supersets. So 0x20 is always 
the space character.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund wrote:
> On 06/16/2009 02:14 PM, Greg Stark wrote:
>>
>> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund
>>  wrote:
>>>
>>> 1710.98
>>> 1710.98
>>> 72398
>>> 4
>>> 136.595
>>> 136.595
>>> 72398
>>> 1
>>
>> XML's not really my thing currently but it sure seems strange to me to
>> have *everything* be a separate tag like this. Doesn't XML do
>> attributes too? I would have thought to use child tags like this only
>> for things that have some further structure.
>
>> I would have expected something like:
>>
>> >     
>>         
>>         
>>     
>>     
>>         
>>              
>>          
>>      
>> 
>>
>>
>> This would allow something like a graphical explain plan to still make
>> sense of a plan even if it finds a node it doesn't recognize. It would
>> still know generally what to do with a "scan" node or a "join" node
>> even if it is a new type of scan or join.

As long as you understand how the current code uses  and
, you can do this just as well with the current implementation.
 Each plan node gets a .  If there are any plans "under" it, it
gets a  child which contains those.  Whether you put the
additional details into attributes or other tags is irrelevant.  As to
why I chose to do it this way, I had a couple of reasons:

1. It didn't seem very wise to go with the approach of trying to do
EVERYTHING with attributes.  If I did that, then I'd either get really
long lines that were not easily readable, or I'd have to write some
kind of complicated line wrapping code (which didn't seem to make a
lot of sense for a machine-readable format).  The current format isn't
the most beautiful thing I've ever seen, but you don't need a parser
to make sense of it, just a bit of patience.

2. I wanted the JSON output and the XML output to be similar, and that
seemed much easier with this design.

3. We have existing precedent for this design pattern in, e.g. table_to_xml

http://www.postgresql.org/docs/current/interactive/functions-xml.html

> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> 
>   ...
>   ...
>   ...
> 
>
> Something like that would be harder with your variant.
>
> Structuring it in tags like suggested above:
> 
>    ...
>    ...
> 
> 
>    ...
>    ...
> 
>
> Enables displaying unknown 'scalar' values just like your variant and also
> allows more structured values.
>
> It would be interesting to get somebody having used the old explain in an
> automated fashion into this discussion...

Well, one problem with this is that the actual values are not costs,
but times, and the estimated values are not times, but costs.   The
planner estimates the cost of operations on an arbitrary scale where
the cost of a sequential page fetch is 1.0.  When we measure actual
times, they are in milliseconds.  There is no point that I can see in
making it appear that those are the same thing.  Observe the current
output:

explain analyze select 1;
 QUERY PLAN

 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.007
rows=1 loops=1)
 Total runtime: 0.243 ms
(2 rows)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Jeremy Kerr
Hi all,

> That's correct. To check the alignment you would have to look at the
> actual pointer. I would suggest using the existing macros to handle
> alignment. Hm, though the only one I see offhand which is relevant is
> the moderately silly PointerIsAligned(). Still it would make the code
> clearer even if it's pretty simple.

Yes, the code (incorrectly) assumes that any multiple-of-4 index into 
the char array is aligned, and so the array itself must be aligned for 
this to work.

I'll rework the patch, testing the pointer alignment directly instead.

> Incidentally, the char foo[4] = {' ',' ',' ',' '} suggestion is, I
> think, bogus. There would be no alignment guarantee on that array.
> Personally I'm find with 0x20202020 with a comment explaining what it
> is.

The variable is called 'spaces', but I can add extra comments if 
preferred.

Cheers,


Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

On 06/16/2009 03:22 PM, Robert Haas wrote:

Well, one problem with this is that the actual values are not costs,
but times, and the estimated values are not times, but costs.   The
planner estimates the cost of operations on an arbitrary scale where
the cost of a sequential page fetch is 1.0.  When we measure actual
times, they are in milliseconds.  There is no point that I can see in
making it appear that those are the same thing.  Observe the current
output:

Well - the aim was not to make it possible to use the same name for
"" and "" but to group them in 
some way - so you can decide in some way (prefix or below a distinct 
node)  if they are related to planning or execution (And thus making it 
easier to handle unknown tags).
That  morphed into  instead of 
 was just a typo.
Another solution would be to rename  into 
 for consistency. But grouping them by some node 
seems to be a bit more future-proof.



Andres


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andrew Dunstan



Robert Haas wrote:

3. We have existing precedent for this design pattern in, e.g. table_to_xml

http://www.postgresql.org/docs/current/interactive/functions-xml.html

  


Tables are flat, explain output is not.

If there is a relationship between the items then that needs to be 
expressed in the XML structure, either by use of child nodes or 
attributes. Relying on the sequence of nodes, if that's what you're 
doing, is not a good idea, and will make postprocessing the XML using 
XSLT, for example, quite a bit harder. (Processing a foo that comes 
after a bar is possible but not as natural as processing a foo that is a 
child or attribute of a bar)


Anyway, I think what this discussion points out is that we actually need 
a formal XML Schema for this output.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Tom Lane
Greg Stark  writes:
> On some architectures like intel accessing unaligned ints is just
> slow. On others (Alpha and PPC iirc?) it is an immediate bus error.

To a first approximation, Intel is the *only* popular architecture that
doesn't bus-error on unaligned accesses.  (And I'm sure their chip
designers rue the day that their predecessors chose to allow that.)

There are some systems where the kernel trap handler then proceeds to
emulate the unaligned access for you, but that gives new meaning to the
word "slow".  You definitely don't want to be doing it in a patch that's
alleged to give a performance improvement.

Speaking of which, what about some performance numbers?  Like Heikki,
I'm quite suspicious of whether there is any real-world gain to be had
from this approach.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

On 06/16/2009 03:45 PM, Andrew Dunstan wrote:
>> 3. We have existing precedent for this design pattern in, e.g.
>> table_to_xml
>> http://www.postgresql.org/docs/current/interactive/functions-xml.html
> Tables are flat, explain output is not.
Comparing Greg's approach with Robert's it seems to me that Robert's 
approach isn't flatter than Greg's - it just relies more on nodes.



If there is a relationship between the items then that needs to be
expressed in the XML structure, either by use of child nodes or
attributes. Relying on the sequence of nodes, if that's what you're
doing, is not a good idea, and will make postprocessing the XML using
XSLT, for example, quite a bit harder. (Processing a foo that comes
after a bar is possible but not as natural as processing a foo that is a
child or attribute of a bar)

How would you model something like:

   ... 
   ... 
  ...

otherwise?

There are potentially unlimited number of child nodes - AppendNode for 
example can have any number of them. Sure, you can give each  node 
a 'offset=' id, but that doesn't buy much.
I don't see how that could be much improved by using child-nodes (or 
even worse attributes).


That is as far as I have seen the only place where the format relies on 
the sequence of nodes.




Anyway, I think what this discussion points out is that we actually need
a formal XML Schema for this output.

Agreed.

If helpful I can create a schema for the current format.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 9:45 AM, Andrew Dunstan wrote:
> Robert Haas wrote:
>>
>> 3. We have existing precedent for this design pattern in, e.g.
>> table_to_xml
>>
>> http://www.postgresql.org/docs/current/interactive/functions-xml.html
>
> Tables are flat, explain output is not.
>
> If there is a relationship between the items then that needs to be expressed
> in the XML structure, either by use of child nodes or attributes. Relying on
> the sequence of nodes, if that's what you're doing, is not a good idea, and

I'm not doing that.  Period, full stop.  The discussion was only about
attributes vs. child nodes.

> Anyway, I think what this discussion points out is that we actually need a
> formal XML Schema for this output.

Well, I don't know how to write one, and am not terribly interested in
learning.  Perhaps someone else would be interested?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund wrote:
> How would you model something like:
> 
>   ... 
>   ... 
>  ...
> 
> otherwise?
>
> There are potentially unlimited number of child nodes - AppendNode for
> example can have any number of them. Sure, you can give each  node a
> 'offset=' id, but that doesn't buy much.
> I don't see how that could be much improved by using child-nodes (or even
> worse attributes).

Note that even in this case we DON'T rely on the ordering of the
nodes.  The inner  nodes have child nodes which contain their
relationship to the parent.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep: communication between backends and walsender

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao wrote:
> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
> system call)

say what?

-- 
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Jeremy Kerr
Hi Tom,

> Speaking of which, what about some performance numbers?  Like Heikki,
> I'm quite suspicious of whether there is any real-world gain to be
> had from this approach.

Will send numbers tomorrow, with the reworked patch.

Cheers,


Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andrew Dunstan



Robert Haas wrote:


If there is a relationship between the items then that needs to be expressed
in the XML structure, either by use of child nodes or attributes. Relying on
the sequence of nodes, if that's what you're doing, is not a good idea, and



I'm not doing that.  Period, full stop.  The discussion was only about
attributes vs. child nodes.

  
  
OK, I misread something you wrote, which prompted me to say that. 
Rereading it I realise my error. My apologies.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

Hi,

On 06/16/2009 04:32 PM, Robert Haas wrote:

On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund  wrote:

How would you model something like:

...
...
  ...

otherwise?

There are potentially unlimited number of child nodes - AppendNode for
example can have any number of them. Sure, you can give each  node a
'offset=' id, but that doesn't buy much.
I don't see how that could be much improved by using child-nodes (or even
worse attributes).

Note that even in this case we DON'T rely on the ordering of the
nodes.  The inner  nodes have child nodes which contain their
relationship to the parent.

Not in the case of Append nodes, but I fail to see a problem there, so...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 10:59 AM, Andrew Dunstan wrote:
>
>
> Robert Haas wrote:
>>>
>>> If there is a relationship between the items then that needs to be
>>> expressed
>>> in the XML structure, either by use of child nodes or attributes. Relying
>>> on
>>> the sequence of nodes, if that's what you're doing, is not a good idea,
>>> and
>>>
>>
>> I'm not doing that.  Period, full stop.  The discussion was only about
>> attributes vs. child nodes.
>>
>>
>
> OK, I misread something you wrote, which prompted me to say that. Rereading
> it I realise my error. My apologies.

No problem, no apologies needed.  I guess we do emit nodes like append
plans in the same order that they'd be emitted in text mode.  Right
now we don't emit any additional information beyond putting them in
the same order, but I suppose that could be changed if needs be.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch Rep: communication between backends and walsender

2009-06-16 Thread Tom Lane
Greg Stark  writes:
> On Tue, Jun 16, 2009 at 12:50 PM, Fujii Masao wrote:
>> On some platforms, a signal doesn't interrupt sleep (i.e. poll or select
>> system call)

> say what?

Yup, what he said.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Tom Lane
Andres Freund  writes:
> On 06/16/2009 04:32 PM, Robert Haas wrote:
>> Note that even in this case we DON'T rely on the ordering of the
>> nodes.  The inner  nodes have child nodes which contain their
>> relationship to the parent.

> Not in the case of Append nodes, but I fail to see a problem there, so...

The order of Append child nodes is in fact significant.  If this
representation loses that information then it needs to be fixed.
However, is it really so bad to be relying on node order for this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Uninstallation error

2009-06-16 Thread genie.japo
Hi,

I've found the uninstallation error...


# make uninstall

:

n/man7/truncate.7 /usr/local/pgsql/share/man/man7/unlisten.7
/usr/local/pgsql/share/man/man7/update.7
/usr/local/pgsql/share/man/man7/vacuum.7
/usr/local/pgsql/share/man/man7/values.7
rm: cannot remove `/usr/local/pgsql/share/man/man1/': Is a directory
rm: cannot remove `/usr/local/pgsql/share/man/man7/': Is a directory
make[1]: *** [uninstall] Error 1
make[1]: Leaving directory `/usr/local/src/postgresql-8.4rc1/doc'
make: *** [uninstall] Error 2


Maybe, it is solved by the change in the following. (adding -r option)

doc/Makefile
100c100
<   rm -f $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
$(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
's/.7$$/.$(sqlmansect)/'))
---
>   rm -rf $(addprefix $(DESTDIR)$(mandir)/, $(shell gunzip -c
$(srcdir)/man.tar.gz | tar tf - | sed -e 's,man7/,man$(sqlmansectnum)/,' -e
's/.7$$/.$(sqlmansect)/'))


Regards,
Genie Japo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andrew Dunstan



Tom Lane wrote:

Andres Freund  writes:
  

On 06/16/2009 04:32 PM, Robert Haas wrote:


Note that even in this case we DON'T rely on the ordering of the
nodes.  The inner  nodes have child nodes which contain their
relationship to the parent.
  


  

Not in the case of Append nodes, but I fail to see a problem there, so...



The order of Append child nodes is in fact significant.  If this
representation loses that information then it needs to be fixed.
However, is it really so bad to be relying on node order for this?


  


No, if there is a genuine sequence of items then relying on node order 
is just fine. My earlier (mistaken) reference was to possibly relying on 
node order for a non-sequence relationship.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andrew Dunstan



Andres Freund wrote:

Anyway, I think what this discussion points out is that we actually need
a formal XML Schema for this output.

Agreed.

If helpful I can create a schema for the current format.




That will give us a useful starting point.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Greg Stark
On Tue, Jun 16, 2009 at 1:53 PM, Andres Freund wrote:
> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> 
>   ...
>   ...
>   ...
> 
>
> Something like that would be harder with your variant.

Actually that's exactly the kind of example I had in mind to make easier.

I'm picturing adding a new tag, such as , or actually I was
thinking of . If we have separate tags for all the estimates
and actual timings then any tags which come with the  or
 option would just get mixed up with the estimates and timing
info.

Each new module would provide a single tag which would have some
attributes and some child tags depending on how much structure it
needs. In cases where there's no structure, just a fixed list of
scalars like the existing expected and actual stats I don't see any
advantage to making each scalar a tag. (There's not much disadvantage
except I would have said it was completely unreadable for a human
given that you would have pages and pages of output for a significant
size plan.)

So your plan might look like


  
  
  
  
  
  
  
  
  
  
  
  
  

That would make it easy for a tool like pgadmin which doesn't know
what to do with the iostats to ignore the whole chunk, rather than
have to dig through a list of stats some of which come from iostats
and some from dtrace and some from the instrumentation and have to
figure out which tags are things it can use and which are things it
can't.

-- 
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Stefan Kaltenbrunner

Jeremy Kerr wrote:

Hi Tom,


Speaking of which, what about some performance numbers?  Like Heikki,
I'm quite suspicious of whether there is any real-world gain to be
had from this approach.


Will send numbers tomorrow, with the reworked patch.


I can easily redo my testing as well if required.


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GRANT ON ALL IN schema

2009-06-16 Thread Petr Jelinek

Hi all,

I am thinking about implementing GRANT ON ALL TABLES IN schema TODO 
item. I saw many people sending proposals to the list but nobody seemed 
to actually do anything. I have few questions and problems to iron out 
before I can start the implementation. I would also like to note that I 
am not going to implement the second part (GRANT ON NEW TABLES) of the 
proposed TODO item as there seems to be better solution to this which is 
Default ACLs (http://wiki.postgresql.org/wiki/DefaultACL) - btw is 
anybody working on that ? If not I am interested in doing it also as a 
complementary patch to this one.


Anyway back to my thoughts about this patch. First of all I see problem 
with the proposed syntax. For this syntax I think TABLES (FUNCTIONS, 
SEQUENCES, etc) would have to be added to keywords which is problematic 
because there are views named tables, sequences, views in 
information_schema so we can't really make them keywords. I have no idea 
how to get around this and I don't see good alternative syntax either. 
This is main and only real problem I have.


The other stuff is minor, like do we want this only for tables, 
sequences, functions and views or do we want it for every object for 
which we have GRANT command. Also in standard GRANT there is no 
distinction between table and view, I guess in this case there should be.


--
Regards
Petr Jelinek (PJMODOS)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postmaster recovery and automatic restart suppression

2009-06-16 Thread Czichy, Thoralf (NSN - FI/Helsinki)


hi,

I am working together with Harald on this issue. Below some thoughts on 
why we think it should be possible to disable the postmaster-internal 
recovery attempt and instead have faults in the processes started 
by postmaster escalated to postmaster-exit.



[Our typical "embedded" situation]

* Database is small 0.1 to 1 GB (e.g. we consider it the safest strategy

  to copy the whole database from the active to standby before 
  reconnecting the standby after switchover or failover).

* Few clients only (10-100)

* There is no shared storage between the two instances (this means no 
  concurrent access to shared resources, no isolation problems for 
  shared resources)

* Switchover is fast, less than a few seconds

* Disk I/O is slow (no RAID, possibly (slow) flash-based)

* The same nodes running database also run lots of other functionality 
  (some dependent on DB, most not)



[Keep recovery decision and recovery action in cluster-HA-middleware]

Actually the problem we're trying to solve is to keep the decision
what's 
the best recovery strategy outside of the DB. In our use case this logic

is expressed in the cluster-HA-middleware and recovery actions are
initiated 
by this middleware rather than each individual piece of software started
by 
it; software is generally expected to "fail fast and safe" in case of 
errors. As long as you trust hardware and OS kernel, a process exit is 
usually such a fail fast and safe operation. It's "Safe" because process

exit causes the kernel to release the resources the process holds. It's
also 
fast. Though, "fast" is a bit more debatable as a simple signal from the

postmaster to the cluster middleware would probably be faster. However 
lacking such a signal, a SIGCHILD is the next best thing.

The middleware can make decisions such as (all of this is configurable 
and postmaster-health is _just_one_input_ of many to reach a decision on

the correct behavior)

 Policy 1: By default try to restart the active instance N times, after 
   that do a switchover
 Policy 2: If the active Postgres fails and the standby is available and

   up-to-date, do an immediate switchover. If the standby is not

   available, restart.
 Policy 3: If the active Postgres fails, escalate the problem to
node-level,
   isolate the active node and do the switchover to the standby.

 Policy 4: In single-node systems, restart db instance N times. If it
fails 
   more often than N times in X seconds, stop it and give an 
   indication to the operator (SNMP-trap to management system,
text 
   message, ...) that something is seriously wrong and manual 
   intervention is needed.

In the current setup we want to go for Policy 2. In earlier unrelated 
products (not using PostgreSQL) we actually had policies 1, 3 and 4.

Another typical situation is that recovery behavior is different during 
upgrades compared to the behavior during normal operation. E.g. when 
the (new) database instance fails during an automatic schema-conversion 
during upgrade we would want to automatically fallback to the previous 
version.



[STONITH is not always best strategy if failures can be declared as 
user-space software problem only, limit STONITH to HW/OS failures]

The isolation of the failing Postgres instance does not require a
STONITH 
- mainly as there's also other software running on the same node that
we'd 
not want to automatically switchover (e.g. because it takes longer to do
or 
the functionality is more critical or less critical). Also we generally
trust 
the HW, OS kernel and cluster middleware to behave correctly . These
functions
also follow the principle of fail-fast-and-safe. This trust might be an 
assumption that not everybody agrees with, though. So, if the failure
originated 
from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a 
user-space problem - the default assumption is that isolation can be
implemented on 
OS-level and that's a guarantee that the clusterware gives (using a
separate 
Quorum mechanism to avoid split-brain situations).




[Example of user-space software failures]

So, what kind of failures would cause a user-space switchover rather
than 
node-level isolation? This gets a bit philosophical. If you assume that
many 
software failures are caused by concurrency issues, switching over to
the 
standby is actually a good strategy as it's unlikely that the same
concurrency 
issue happens again on the standby. Another reason for software failures

is entering exceptional situations, such as disk getting full, overload
on the 
node (causes by some other process), backup being taken, upgrade
conversion 
etc. So here the idea is that failover to a standby instance helps as
long as 
there's some hope that on the standby side the situation is different.
If we'd 
just have an internal Postgres restart in such situations, we'd have
flapping 
db connectivity - without the operator even being aware 

Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Tom Lane
Greg Stark  writes:
> I'm picturing adding a new tag, such as , or actually I was
> thinking of . If we have separate tags for all the estimates
> and actual timings then any tags which come with the  or
>  option would just get mixed up with the estimates and timing
> info.

FWIW, I like Greg's idea of subdividing the available data this way.
I'm no XML guru, so maybe there is a better way to do it --- but a
very large part of the reason for doing this at all is to have an
extensible format, and part of that IMHO is that client programs should
be able to have some rough idea of what things are even when they
don't know it exactly.

But I'd be just as happy with a naming convention, like
 versus , etc.  I don't know
enough about XML usage to understand the benefits and costs of
different ways of providing that kind of structure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andrew Dunstan



Tom Lane wrote:

But I'd be just as happy with a naming convention, like
 versus , etc.  I don't know
enough about XML usage to understand the benefits and costs of
different ways of providing that kind of structure.

  

FYI, you probably don't want this. the ':' is not just another character, it 
separates the namespace designator from the local name. We probably only want 
one namespace. You can use '-' or '_'  or '.' inside names to give them some 
structure beyond XML semantics.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postmaster recovery and automatic restart suppression

2009-06-16 Thread Tom Lane
"Czichy, Thoralf (NSN - FI/Helsinki)"  writes:
> I am working together with Harald on this issue. Below some thoughts on 
> why we think it should be possible to disable the postmaster-internal 
> recovery attempt and instead have faults in the processes started 
> by postmaster escalated to postmaster-exit.

I'll tell you what the fundamental problem with this is: it's converting
Postgres into a piece of software that is completely dependent on some
hypothetical outside management code in order to meet one of its basic
design goals.  That isn't going to go over very well to start with.
Until you have written such management code, made it freely available,
and demonstrated that this type of recovery approach is *actually* not
hypothetically useful in a real-world environment, it's unlikely
that anyone is going to want to consider it.

I'd recommend just carrying a private patch to make Postgres do what
you want ... it's unlikely to be the only such patch you need anyway.
One obvious example is that nothing you describe is sensible without
exposing more information than "something failed" to the outside
management code.  You'll want some kind of API in there to pass on
whatever the postmaster knows to the outside code.

We might consider adopting a set of patches like that once it's been
demonstrated to be useful for a live project, but I don't think we'll
accept it on speculation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] concurrent COPY performance

2009-06-16 Thread Stefan Kaltenbrunner

Hi!

I have been doing some bulk loading testing recently - mostly with a 
focus on answering why we are "only" getting a (max of) cores/2(up to 
around 8 cores even less with more) speedup using parallel restore.
What I found is that on some fast IO-subsystem we are CPU bottlenecked 
on concurrent copy which is able to utilize WAL bypass (and scale up to 
around cores/2) and performance without wal bypass is very bad.
In the WAL logged case we are only able to get a 50% speedup using the 
second process already and we are never able to scale better than 3x (up 
to 8 cores) and performance degrades even after that point.


the profile(loading the lineitem table from the DBT3 benchmark) for that 
case looks fairly similiar to what I have seen in the past for 
io-intensive concurrent workloads:



Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a 
unit mask of 0x00 (Unhalted core cycles) count 10

samples  %symbol name
4754615.4562  XLogInsert
3959812.8725  DoCopy
3379810.9870  CopyReadLine
14191 4.6132  DecodeNumber
12986 4.2215  heap_fill_tuple
12092 3.9308  pg_verify_mbstr_len
9553  3.1055  DecodeDate
9289  3.0197  InputFunctionCall
7972  2.5915  ParseDateTime
7324  2.3809  DecodeDateTime
7290  2.3698  pg_next_dst_boundary
7218  2.3464  heap_form_tuple
5385  1.7505  AllocSetAlloc
4779  1.5536  heap_compute_data_size
4367  1.4196  float4in
3903  1.2688  DetermineTimeZoneOffset
3603  1.1713  pg_mblen
3494  1.1358  pg_atoi
3461  1.1251  .plt
3428  1.1144  date2j
3416  1.1105  pg_mbstrlen_with_len

this is for 8 connections on an 8core/16 thread box. on higher 
connection counts the server is >70% idle and showing even worse througput.


in the WAL bypass case I can actually get a performance improvement up 
to 16 parallel connections however I also only get cores/2 maximum 
throughput here too. We are actually able to max out the CPU on the 
server in that case though (no idle time and iowait only in the single 
digit range). Profiling that workload(loading around 140 rows/s) I get:


samples  %symbol name
2462358  15.3353  DoCopy
2319677  14.4467  CopyReadLine
8062695.0214  pg_verify_mbstr_len
7825944.8739  DecodeNumber
7124264.4369  heap_fill_tuple
6428934.0039  DecodeDate
6093133.7947  InputFunctionCall
4768872.9700  ParseDateTime
4565692.8435  pg_next_dst_boundary
4358122.7142  DecodeDateTime
4297332.6763  heap_form_tuple
3617502.2529  heap_compute_data_size
3011931.8758  AllocSetAlloc
2688301.6742  float4in
2381191.4830  DetermineTimeZoneOffset
2295981.4299  pg_atoi
2251691.4023  .plt
2178171.3565  pg_mbstrlen_with_len
2070411.2894  PageAddItem
2000241.2457  pg_mblen
1922371.1972  bpchar_input
1815521.1307  date2j


for those interested I have some additional information up on:

http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane wrote:
> Greg Stark  writes:
>> I'm picturing adding a new tag, such as , or actually I was
>> thinking of . If we have separate tags for all the estimates
>> and actual timings then any tags which come with the  or
>>  option would just get mixed up with the estimates and timing
>> info.
>
> FWIW, I like Greg's idea of subdividing the available data this way.
> I'm no XML guru, so maybe there is a better way to do it --- but a
> very large part of the reason for doing this at all is to have an
> extensible format, and part of that IMHO is that client programs should
> be able to have some rough idea of what things are even when they
> don't know it exactly.

I like it too, but I'd like to see us come up with a design that
allows it to be used for all of the output formats (text, XML, and
JSON).  I think it we should be looking for a way to allow modules to
publish abstract objects like property-value mappings, or lists of
strings, rather than thinking strictly in terms of XML.  If we have a
module called foo that emits property bar with value baz and property
bletch with value quux, then in text format we can print:

Module Foo:
  Bar: Bletch
  Baz: Quux

In XML we can print:


  
Foo
Bletch
Quux
  


(or any of about 10 reasonable alternatives that are functionally identical)

In JSON we can print

"Modules" : [
  {
"Module Name" : "Foo",
"Bar": "Bletch",
"Baz": "Quux"
  }
]

(or any of about 2 reasonable alternatives that are functionally identical)

If we start thinking in terms of "provide an API to insert XML into
the XML-format output", we get back to my original complaint: if the
only way of getting additional data is to piece through the XML
output, then we'll quickly reach the point where users need XSLT and
stylesheets to extract the data they care about.  I think that's an
annoyance that is easily avoidable.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane wrote:
>> FWIW, I like Greg's idea of subdividing the available data this way.

> I like it too, but I'd like to see us come up with a design that
> allows it to be used for all of the output formats (text, XML, and
> JSON).  I think it we should be looking for a way to allow modules to
> publish abstract objects like property-value mappings, or lists of
> strings, rather than thinking strictly in terms of XML.  If we have a
> module called foo that emits property bar with value baz and property
> bletch with value quux, then ...

This seems to be missing the point I was trying to make, which is that
a design like that actually offers no leverage at all: if you don't know
all about foo to start with, you have no idea what to do with either bar
or bletch.  You can *parse* the data, since it's in XML or JSON or
whatever, but you don't know what it is.

The EXPLAIN problem is a fairly constrained universe: there is going to
be a tree of plan nodes, there are going to be some static properties of
each plan node, and there may or may not be various sorts of estimates
and/or measurements attached to each one.  What I'm after is that code
examining the output can know "oh, this is a measurement" even if it
hasn't heard of the particular kind of measurement.

As a concrete example of what I'm thinking about, I'd hope that PgAdmin
would be able to display a graphical summary of a plan tree, and then
pop up measurements associated with one of the nodes when you
right-click on that node.  To do this, it doesn't necessarily have to
know all about each specific measurement that a particular backend
version might emit; but it needs to be able to tell which things are
measurements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane wrote:
>>> FWIW, I like Greg's idea of subdividing the available data this way.
>
>> I like it too, but I'd like to see us come up with a design that
>> allows it to be used for all of the output formats (text, XML, and
>> JSON).  I think it we should be looking for a way to allow modules to
>> publish abstract objects like property-value mappings, or lists of
>> strings, rather than thinking strictly in terms of XML.  If we have a
>> module called foo that emits property bar with value baz and property
>> bletch with value quux, then ...
>
> This seems to be missing the point I was trying to make, which is that
> a design like that actually offers no leverage at all: if you don't know
> all about foo to start with, you have no idea what to do with either bar
> or bletch.  You can *parse* the data, since it's in XML or JSON or
> whatever, but you don't know what it is.
>
> The EXPLAIN problem is a fairly constrained universe: there is going to
> be a tree of plan nodes, there are going to be some static properties of
> each plan node, and there may or may not be various sorts of estimates
> and/or measurements attached to each one.  What I'm after is that code
> examining the output can know "oh, this is a measurement" even if it
> hasn't heard of the particular kind of measurement.
>
> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
> would be able to display a graphical summary of a plan tree, and then
> pop up measurements associated with one of the nodes when you
> right-click on that node.  To do this, it doesn't necessarily have to
> know all about each specific measurement that a particular backend
> version might emit; but it needs to be able to tell which things are
> measurements.

*scratches head*

So you're looking for a way to categorize the data that appear in the
output by type, like any given piece of data is either a measurement,
an estimate, or a part of the plan structure?

It seems to me that with a sufficiently powerful API, add-on modules
could emit arbitrary stuff that might not fall into the categories
that you've mentioned.  For example, there was a previous EXPLAIN XML
patch which contained a bunch of code that spit out plans that were
considered but not chosen.  And there could easily be other kinds of
less invasive add-ons that would still want to emit properties that
are formatted as text or lists rather than measurements per se.

I think it's kind of hopeless to think that a third-party module is
going to be able to do much better than to display any unexpected
properties whose value is just text and punt any unexpected properties
whose value is a complex object (nested tags in XML-parlance, hash in
JSON).

I have a feeling I'm still missing the point here...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen

2009-06-16 Thread Chuck McDevitt
> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Stephen Frost
> Sent: Tuesday, June 16, 2009 5:47 AM
> To: Greg Stark
> Cc: Robert Haas; Jeremy Kerr; ; Alvaro
> Herrera; Stefan Kaltenbrunner; Gurjeet Singh
> Subject: Re: [HACKERS] [PATCH] backend: compare word-at-a-time in
> bcTruelen
> 

On 64-bit machines, the native word size is 64-bits (obviously), and comparing 
32 bits at a time is much slower than comparing 64 bits at a time.

You might want to consider this.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane wrote:
>> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
>> would be able to display a graphical summary of a plan tree, and then
>> pop up measurements associated with one of the nodes when you
>> right-click on that node.

> It seems to me that with a sufficiently powerful API, add-on modules
> could emit arbitrary stuff that might not fall into the categories
> that you've mentioned.

I don't have a problem with inventing new categories when we need to.
What I'm objecting to is using the above to justify flattening the
design completely, so that the only way to know anything about
a particular datum is to know that type of datum specifically.
There is way more structure in EXPLAIN than that, and we should
design it accordingly.

(Note that any information about rejected plans could not usefully be
attached to the plan tree anyway; it'd have to be put in some other
child of the topmost node.)

> And there could easily be other kinds of
> less invasive add-ons that would still want to emit properties that
> are formatted as text or lists rather than measurements per se.

By "measurement" I did not mean to imply "single number".  Text strings
or lists could be handled very easily, I think, especially since there
are explicit ways to represent those in XML.

The main point here is that we have a pretty good idea of what
general-purpose client code is likely to want to do with the data, and
in a lot of cases that does not translate to having to know each node
type explicitly, so long as it can be categorized.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GRANT ON ALL IN schema

2009-06-16 Thread Petr Jelinek

Petr Jelinek wrote:
Anyway back to my thoughts about this patch. First of all I see problem 
with the proposed syntax. For this syntax I think TABLES (FUNCTIONS, 
SEQUENCES, etc) would have to be added to keywords which is problematic 
because there are views named tables, sequences, views in 
information_schema so we can't really make them keywords. I have no idea 
how to get around this and I don't see good alternative syntax either. 
This is main and only real problem I have.


Erm, seems like the problem was just me overlooking something in gram.y 
(I forgot to add those keywords to unreserved_keyword) so no real 
problems, but I'd still like to hear answers to those other questions in 
my previous email.



--
Regards
Petr Jelinek (PJMODOS)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Robert Haas
On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane wrote:
> The main point here is that we have a pretty good idea of what
> general-purpose client code is likely to want to do with the data, and
> in a lot of cases that does not translate to having to know each node
> type explicitly, so long as it can be categorized.

I agree.  I'm just not seeing the need for an *explicit*
categorization contained within the data itself.  For one thing, AIUI,
that's the job of things like an XML Schema, which Andres Freund has
already agreed to write, and I would expect that would be of some
value to tool-writers, else why are we creating it?  I also think
scalars and lists are recognizable without any particular additional
markup at all, just by introspection of the contents.

Even if we do need some kind of additional markup, I'm reluctant to
try to design it without some feedback from people writing actual
tools about what they find inadequate in the current output.  The good
news is that if this patch gets committed fairly quickly after the
release of 8.4, tool authors should have enough time to discover where
any bodies are buried in time to fix them before 8.5.  But I'm really
unconvinced that any of this minor formatting stuff is going to rise
to the level of a real problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output

2009-06-16 Thread Andres Freund

On 06/16/2009 09:51 PM, Robert Haas wrote:

On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane  wrote:

The main point here is that we have a pretty good idea of what
general-purpose client code is likely to want to do with the data,
and in a lot of cases that does not translate to having to know
each node type explicitly, so long as it can be categorized.

I agree.  I'm just not seeing the need for an *explicit*
categorization contained within the data itself.  For one thing,
AIUI, that's the job of things like an XML Schema, which Andres
Freund has already agreed to write, and I would expect that would be
of some value to tool-writers, else why are we creating it?

It defines how exactly the output has to look - thats not easily
readable out of explain.c - so anything that could be created and
validated with that schema should be acceptable by $tool - even if
explain may not create it.
Just like EBNF or similar for other languages.

It does not help categorizing values in planner/execution/whatever 
categories automatedly by some tool though.


I attached a simple relaxng schema - if somebody likes another format
that should be generatable out of that (using trang). It surely could 
use some more work, but I think its detailed enough for now.



I also think scalars and lists are recognizable without any
particular additional markup at all, just by introspection of the
contents.
That somewhat defies the usage of a strictly structured format? Perhaps 
I am misunderstanding you though.


On another note it may be interesting to emit the current options to 
explain in xml/json format - although that depends whether the option 
syntax will be accepted.



Writing the schema I noticed something else I did not like about the 
current format:




Name
or:
ConstraintName



The double usage of "" seems to be somewhat ugly. Renaming it 
to / seems to be a good idea - at least 
when staying at the current tag oriented style.


Andres
http://relaxng.org/ns/structure/1.0"; 
ns="http://www.postgresql.org/2009/explain";>



























 



 





















































































   

Re: [HACKERS] concurrent COPY performance

2009-06-16 Thread Merlin Moncure
On Tue, Jun 16, 2009 at 12:47 PM, Stefan
Kaltenbrunner wrote:
> Hi!
>
> I have been doing some bulk loading testing recently - mostly with a focus
> on answering why we are "only" getting a (max of) cores/2(up to around 8
> cores even less with more) speedup using parallel restore.
> What I found is that on some fast IO-subsystem we are CPU bottlenecked on
> concurrent copy which is able to utilize WAL bypass (and scale up to around
> cores/2) and performance without wal bypass is very bad.
> In the WAL logged case we are only able to get a 50% speedup using the
> second process already and we are never able to scale better than 3x (up to
> 8 cores) and performance degrades even after that point.

how are you bypassing wal?  do I read this properly that on your 8
core system you are getting 4x speedup with wal bypass and 3x speedup
without?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent COPY performance

2009-06-16 Thread Andrew Dunstan



Merlin Moncure wrote:

On Tue, Jun 16, 2009 at 12:47 PM, Stefan
Kaltenbrunner wrote:
  

Hi!

I have been doing some bulk loading testing recently - mostly with a focus
on answering why we are "only" getting a (max of) cores/2(up to around 8
cores even less with more) speedup using parallel restore.
What I found is that on some fast IO-subsystem we are CPU bottlenecked on
concurrent copy which is able to utilize WAL bypass (and scale up to around
cores/2) and performance without wal bypass is very bad.
In the WAL logged case we are only able to get a 50% speedup using the
second process already and we are never able to scale better than 3x (up to
8 cores) and performance degrades even after that point.



how are you bypassing wal?  do I read this properly that on your 8
core system you are getting 4x speedup with wal bypass and 3x speedup
without?
  


If a table is created or truncated in the same transaction that does the 
load, and archiving is not on, the COPY is not WALed. That is why 
parallel restore wraps the COPY in a transaction and precedes it with a 
TRUNCATE if it created the table.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent COPY performance

2009-06-16 Thread Kevin Grittner
Andrew Dunstan  wrote:
 
> If a table is created or truncated in the same transaction that does
> the load, and archiving is not on, the COPY is not WALed.
 
Slightly off topic, but possibly relevant to the overall process:
those are the same conditions under which I would love to see the
rows inserted with the hint bits showing successful commit and the
transaction ID showing frozen.  We currently do a VACUUM FREEZE
ANALYZE after such a load, to avoid burdening random users with the
writes.  It would be nice not to have to write all the pages again
right after a load.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent COPY performance

2009-06-16 Thread Stefan Kaltenbrunner

Merlin Moncure wrote:

On Tue, Jun 16, 2009 at 12:47 PM, Stefan
Kaltenbrunner wrote:

Hi!

I have been doing some bulk loading testing recently - mostly with a focus
on answering why we are "only" getting a (max of) cores/2(up to around 8
cores even less with more) speedup using parallel restore.
What I found is that on some fast IO-subsystem we are CPU bottlenecked on
concurrent copy which is able to utilize WAL bypass (and scale up to around
cores/2) and performance without wal bypass is very bad.
In the WAL logged case we are only able to get a 50% speedup using the
second process already and we are never able to scale better than 3x (up to
8 cores) and performance degrades even after that point.


how are you bypassing wal?  do I read this properly that on your 8
core system you are getting 4x speedup with wal bypass and 3x speedup
without?


The test is simply executing something like psql -c "BEGIN;TRUNCATE 
lineitem1;COPY lineitem1 FROM ;COMMIT;". in parallel with the source 
file being hosted on a seperate array and primed into the OS buffercache.
The box has 8cores/16 threads actually - I get a 3x speedup up to using 
8 processes without wal-bypass but on higher connection counts the 
performances degraded.
Utilizing wal bypass I get near perfect scalability up to using 4 
connections and a maximum speedup of ~8x by using 16 connections (ie all 
threads)



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers