Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread Alex Hunsaker
On Sat, Dec 18, 2010 at 20:29, David E. Wheeler  wrote:
> On Dec 17, 2010, at 9:32 PM, David Christensen wrote:
>    latin=# SELECT * FROM perlgets('“hello”');
>     length │ is_utf8
>    ┼─
>         11 │ f
>
> (Yes I used Latin-1 curly quotes in that last example).

Erm, latin1 does not have curly quotes, Windows-1252 does.  Those are
utf8 quotes AFAICT so 11 is actually right (thats 3 bytes per quote so
that  where 11 comes from).   If latin1 did have quotes and you used
them you would have gotten the same answer as latin1 is a single byte
encoding. :) I think your terminal tripping you up here.

Postgres also gives the same length:
latin1=# select length('“hello”');
 length

 11
(1 row)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-18 Thread flyusa2010 fly
folks,

Does postgres make an effort to create a file with physically continuous
blocks?

Thanks!


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-12-18 Thread Craig Ringer

On 18/12/2010 1:13 AM, Magnus Hagander wrote:

On Fri, Dec 17, 2010 at 17:42, Magnus Hagander  wrote:

On Fri, Dec 17, 2010 at 17:24, Craig Ringer  wrote:

On 17/12/2010 7:17 PM, Magnus Hagander wrote:

Now, that's annoying. So clearly we can't use that function to
determine which version we're on. Seems it only works for "image help
api", and not the general thing.

According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
we could look for:

SysEnumLines - if present, we have at least 6.1.

However, I don't see any function that appeared in 6.0 only..


Actually, I'm wrong - there are functions enough to determine the
version. So here's a patch that tries that.


Great. I pulled the latest from your git tree, tested that, and got much 
better results. Crashdump size is back to what I expected. In my test 
code, fcinfo->args and fcinfo->argnull can be examined without problems. 
Backtraces look good; see below. It seems to be including backend 
private memory again now. Thanks _very_ much for your work on this.


fcinfo->flinfo is still inaccessible, but I suspect it's in shared 
memory, as it's at 0x0135 . Ditto fcinfo->resultinfo and 
fcinfo->context.


This has me wondering - is it going to be necessary to dump shared 
memory to make many backtraces useful? I just responded to Tom 
mentioning that the patch doesn't currently dump shared memory, but I 
hadn't realized the extent to which it's used for _lots_ more than just 
disk buffers. I'm not sure how to handle dumping shared_buffers when 
someone might be using multi-gigabyte shared_buffers, though. Dumping 
the whole lot would risk sudden out-of-disk-space issues, slowdowns as 
dumps are written, and the backend being "frozen" as it's being dumped 
could delay the system coming back up again. Trying to selectively dump 
critical parts could cause dumps to fail if the system is in early 
startup or a bad state.


The same concern applies to writing backend private memory; it's fine 
most of the time, but if you're doing data warehousing queries with 2GB 
of work_mem, it's going to be nasty having all that extra disk I/O and 
disk space use, not to mention the hold-up while the dump is written. If 
this is something we want to have people running in production "just in 
case" or to track down rare / hard to reproduce faults, that'll be a 
problem.


OTOH, we can't really go poking around in palloc contexts to decide what 
to dump.


I guess we could always do a small, minimalist minidump, then write 
_another_ dump that attempts to include select parts of shm and backend 
private memory.


I just thought of two other things, too:

- Is it possible for this handler to be called recursively if it fails 
during the handler call? If so, do we need to uninstall the handler 
before attempting a dump to avoid such recursion? I need to do some 
testing and dig around MSDN to find out more about this.


- Can asynchronous events like signals (or their win32 emulation) 
interrupt an executing crash handler, or are they blocked before the 
crash handler is called? If they're not blocked, do we need to try to 
block them before attempting a dump? Again, I need to do some reading on 
this.



Anyway, here's an example of the backtraces I'm currently getting. 
They're clearly missing some parameters (in shm? Unsure) but provide 
source file+line, argument values where resolvable, and the call stack 
its self. Locals are accessible at all levels of the stack when you go 
poking around in windbg.



This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(930.12e8): Access violation - code c005 (first/second chance not available)
eax=00bce2c0 ebx=72d0e800 ecx=02e4 edx=72cb81c8 esi=00f0 edi=0930
eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0 nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs= efl=0246
ntdll!KiFastSystemCallRet:
771464f4 c3  ret
0:000> .ecxr
eax= ebx= ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804
eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0 nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs= efl=00010246
crashme!crashdump_crashme+0x2:
73621052 c7000100mov dword ptr [eax],1ds:0023:=
0:000> kp
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * 
fcinfo = 0x015e3318)+0x2 
[c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14]
00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 
0x015e3318, struct ExprContext * econtext = 0x00319410, char * isNull = 0x 
"", ExprDoneCond * isDone = 0x7362100f)+0x427 
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824]
00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00

[HACKERS] can shared cache be swapped to disk?

2010-12-18 Thread flyusa2010 fly
hi, folks!

I see that shared cache is implemented by system v shared memory. I wonder
whether data in this area can be swapped out to disk.
Isn't it bad that we read data from disk, put data in shared cache, and
finally data in shared cache is swapped to disk again!

Why not use shmctl(..SHM_LOCK..) to pin data in main memory?

Thanks!


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-12-18 Thread Craig Ringer

On 17/12/2010 11:17 PM, Tom Lane wrote:


So I'd be happy to have it enabled - given that we want a default.


There is absolutely nothing more frustrating than having a crash dump
that hasn't got the information you need.  Please turn it on by default.


There have to be limits to that, though. dbghelp.dll can dump shared 
memory, too - but it's not going to be practical to write or work with 
many-gigabyte dumps most of the time, and neither my initial patch nor 
Magnus's recent reworking of it ask windbg.dll to include shared memory 
in the dump.


It's possible to tell dbghelp.dll to include specific additional memory 
ranges, so it may well be worth including specific critical areas within 
the shared memory block while omitting the bulk of the buffers. Doing 
that safely in an unsafe we're-crashing environment might not be simple, 
though, especially if you're not even sure whether you got far into 
startup before crashing.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] serializable lock consistency

2010-12-18 Thread Robert Haas
On Wed, Dec 15, 2010 at 9:20 AM, Florian Pflug  wrote:
> On Dec14, 2010, at 15:01 , Robert Haas wrote:
>> On Tue, Dec 14, 2010 at 7:51 AM, Florian Pflug  wrote:
 - serializable lock consistency - I am fairly certain this needs
 rebasing.  I don't have time to deal with it right away.  That sucks,
 because I think this is a really important change.
>>> I can try to find some time to update the patch if it suffers from bit-rot. 
>>> Would that help?
>>
>> Yes!
>
> I've rebased the patch to the current HEAD, and re-run my FK concurrency test 
> suite,
> available from https://github.com/fgp/fk_concurrency, to verify that things 
> still work.
>
> I've also asserts to the callers of heap_{update,delete,lock_tuple} to verify 
> (and document)
> that update_xmax may only be InvalidTransactionId if a lockcheck_snapshot is 
> passed to
> heap_{update,delete,lock_tuple}.
>
> Finally, I've improved the explanation in src/backend/executor/README of how 
> row locks and
> REPEATABLE READ transactions interact, and tried to state the guarantees 
> provided by
> FOR SHARE and FOR UPDATE locks more precisely.
>
> I've published my work to 
> https://github.com/fgp/postgres/tree/serializable_lock_consistency,
> and attached an updated patch. I'd be happy to give write access to that GIT 
> repository
> to anyone who wants to help getting this committed.

I am having a hard time understanding this patch.  I decided to start
with the README, and I'm still lost. :-(

My understanding of the problem is as follows.  Acquiring a lock on a
tuple prevents the tuple from being concurrently updated.  You might
take such a lock on a tuple T, make some other modification to the
database M, and commit, in the hope that your lock will prevent a
concurrent transaction from updating T without seeing M.  However, it
doesn't work.  When your lock on T is released, a concurrent
serializable transaction will still be looking at the database using
its original snapshot, and therefore won't see M.  In effect, you
might as well not have bothered obtaining a lock at all.  (You'd get
the same wrong answer that way, without unnecessary blocking!)

To fix this, one must ensure that if a tuple T is locked - either for
update or for share - by transaction A, and if a serializable
transaction B whose snapshot doesn't permit it to see the effects of A
subsequently attempts to update T, it must roll back.

Questions:
1. Is my understanding of the problem correct?
2. Is my understanding of the fix correct?
3. Is that what this patch implements?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-18 Thread Robert Haas
On Sat, Dec 18, 2010 at 10:43 PM, Itagaki Takahiro
 wrote:
> On Sun, Dec 19, 2010 at 12:18, Robert Haas  wrote:
>> I'm sort of suspicious of the fact that BeginCopyTo() is a shell
>> around BeginCopy() while BeginCopyFrom() does a whole bunch of other
>> stuff.  I haven't grokked what the code is doing here well enough to
>> have a concrete proposal though...
>
> I added Begin/EndCopyTo() just because the internal code looks
> symmetric. The proposal doesn't change behaviors of COPY commands
> at all. It just exports a part of COPY FROM codes as "File Reader"
> so that the file_fdw external module can reuse the code. I believe
> we have the conclusion that we should avoid code duplication
> to read files in the prior discussion.

I'm not questioning any of that.  But I'd like the resulting code to
be as maintainable as we can make it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Extensions, patch v20 (bitrot fixes) (was: [HACKERS] Extensions, patch v19 (encoding brainfart fix))

2010-12-18 Thread Robert Haas
On Sat, Dec 18, 2010 at 10:04 PM, Robert Haas  wrote:
> - Has the issue of changing custom_variable_classes from PGC_SIGHUP to
> PGC_SUSET been discussed?  I am not sure whether that's an OK thing to
> do.  If it is OK, then the documentation also needs updating.
>
> - This comment looks like leftovers:
>
> +       /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */
> +       SetConfigOption("custom_variable_classes",
> +                                       newval, PGC_SIGHUP, PGC_S_EXTENSION);
>
> Apologies if I missed the previous discussion of this, but why are we
> adding a new GUC context?

Looking at this a little more, I am inclined to think that
ExtensionSetCVC() is entirely unacceptable.  Our backend startup is
high enough already.  Sequential scanning the pg_extension catalog on
every startup to spare the DBA the trouble of setting up
postgresql.conf strikes me as a bad trade-off.  If you were to come
back and say that custom_variable_classes is a vile hack from the
darkest reaches of wherever vile hacks originate from, I would agree
with you.  Having a GUC that controls what names can be used as GUCs
is probably a bad design, and I'd like to come up with something
better.  But I don't think this is it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-18 Thread Itagaki Takahiro
On Sun, Dec 19, 2010 at 12:18, Robert Haas  wrote:
> I'm sort of suspicious of the fact that BeginCopyTo() is a shell
> around BeginCopy() while BeginCopyFrom() does a whole bunch of other
> stuff.  I haven't grokked what the code is doing here well enough to
> have a concrete proposal though...

I added Begin/EndCopyTo() just because the internal code looks
symmetric. The proposal doesn't change behaviors of COPY commands
at all. It just exports a part of COPY FROM codes as "File Reader"
so that the file_fdw external module can reuse the code. I believe
we have the conclusion that we should avoid code duplication
to read files in the prior discussion.

We could arrange COPY TO codes as like as the COPY FROM APIs, but
I've not and I won't do that at this time because it is not required
by SQL/MED at all. If we do, it would be "File Writer" APIs, like:

cstate = BeginCopyTO(...);
while (tuple = ReadTupleFromSomewhere()) {
  /* write the tuple into a TSV/CSV file */
  NextCopyTo(cstate, tuple);
}
EndCopyTo(cstate);

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Extensions, patch v20 (bitrot fixes) (was: [HACKERS] Extensions, patch v19 (encoding brainfart fix))

2010-12-18 Thread David E. Wheeler
On Dec 18, 2010, at 7:04 PM, Robert Haas wrote:

> - Did we decide to ditch the encoding parameter for extension scripts
> and mandate UTF-8?

+1 

It was certainly suggested. I think it's a good idea, at least with a first cut.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 9:32 PM, David Christensen wrote:

> +1 on the original sentiment, but only for the case that we're dealing with 
> data that is passed in/out as arguments.  In the case that the 
> server_encoding is UTF-8, this is as trivial as a few macros on the 
> underlying SVs for text-like types.  If the server_encoding is SQL_ASCII (= 
> byte soup), this is a trivial case of doing nothing with the conversion 
> regardless of data type.  For any other server_encoding, the data would need 
> to be converted from the server_encoding to UTF-8, presumably using the 
> built-in conversions before passing it off to the first code path.  A similar 
> handling would need to be done for the return values, again 
> datatype-dependent.

+1

> Recent upgrades of the Encode module included with perl 5.10+ have caused 
> issues wherein circular dependencies between Encode and Encode::Alias have 
> made it impossible to load in a Safe container without major pain.  (There 
> may be some better options than I'd had on a previous project, given that 
> we're embedding our own interpreters and accessing more through the XS guts, 
> so I'm not ruling out this possibility completely).

Fortunately, thanks to Tim Bunce, PL/Perl no longer relies on Safe.pm.

>> Well that works for me. I always use UTF8. Oleg, what was the encoding of 
>> your database where you saw the issue?
> 
> I'm not sure what the current plperl runtime does as far as marshaling this, 
> but it would be fairly easy to ensure the parameters came in in perl's 
> internal format given a server_encoding of UTF8 and some type introspection 
> to identify the string-like types/text data.  (Perhaps any type which had a 
> binary cast to text would be a sufficient definition here.  Do domains 
> automatically inherit binary casts from their originating types?) 

Their labels are TEXT. I believe that the only type that should not be treated 
as text is bytea.

>>> 2) its not utf8, so we just leave it as octets.
>> 
>> Which mean's Perl will assume that it's Latin-1, IIUC.
> 
> This is sub-optimal for non-UTF-8-encoded databases, for reasons I pointed 
> out earlier.  This would produce bogus results for any non-UTF-8, non-ASCII, 
> non latin-1 encoding, even if it did not generally bite most people in 
> general usage.

Agreed.

> This example seems bogus; wouldn't length be 3 if this is the example text 
> this was run with?  Additionally, since all ASCII is trivially UTF-8, I think 
> a better example would be using a string with hi-bit characters so if this 
> was improperly handled the lengths wouldn't match; length($all_ascii) == 
> length(encode_utf8($all_ascii)) vs length($hi_bit) < 
> length(encode_utf8($hi_bit)).  I don't see that this test shows us much with 
> the test case as given.  The is_utf8() function merely returns the state of 
> the SV_utf8 flag, which doesn't speak to UTF-8 validity (i.e., this need not 
> be set on ascii-only strings, which are still valid in the UTF-8 encoding), 
> nor does it indicate that there are no hi-bit characters in the string (i.e., 
> with encode_utf8($hi_bit_string)), the source string $hi_bit_string (in 
> perl's internal format) with hi-bit characters will have the utf8 flag set, 
> but the return value of encode_utf8 will not, even though the underlying 
> data, as represented in perl will be identical).


Sorry, I probably had a pasto there. how about this?

CREATE OR REPLACE FUNCTION perlgets(
TEXT
) RETURNS TABLE(length INT, is_utf8 BOOL) LANGUAGE plperl AS $$
   my $text = shift;
   return_next {
   length  => length $text,
   is_utf8 => utf8::is_utf8($text) ? 1 : 0
   };
$$;

utf8=# SELECT * FROM perlgets('“hello”');
 length │ is_utf8 
┼─
  7 │ t

latin=# SELECT * FROM perlgets('“hello”');
 length │ is_utf8 
┼─
 11 │ f

(Yes I used Latin-1 curly quotes in that last example). I would argue that it 
should output the same as the first example. That is, PL/Perl should have 
decoded the latin-1 before passing the text to the Perl function.

> 
>> In a latin-1 database:
>> 
>>   latin=# select * from perlgets('foo');
>>length │ is_utf8 
>>   ┼─
>> 8 │ f
>>   (1 row)
>> 
>> I would argue that in the latter case, is_utf8 should be true, too. That is, 
>> PL/Perl should decode from Latin-1 to Perl's internal form.
> 
> See above for discussion of the is_utf8 flag; if we're dealing with latin-1 
> data or (more precisely in this case) data that has not been decoded from the 
> server_encoding to perl's internal format, this would exactly be the 
> expectation for the state of that flag.

Right. I think that it *should* be decoded.

>> Interestingly, when I created a function that takes a bytea argument, utf8 
>> was *still* enabled in the utf-8 database. That doesn't seem right to me.
> 
> I'm not sure what you mean here, but I do think that if bytea is identi

Re: [HACKERS] SQL/MED - file_fdw

2010-12-18 Thread Robert Haas
On Fri, Dec 17, 2010 at 11:01 PM, Itagaki Takahiro
 wrote:
> On Fri, Dec 17, 2010 at 11:49, Shigeru HANADA  
> wrote:
>> I've just moved permission check and read-only check from BeginCopy()
>> to DoCopy().  Please see attached patch.
>
> Thanks!
>
> Are there any objections for the change? If acceptable,
> I'd like to apply it prior to SQL/MED and file_fdw patches.

I think at a bare minimum the functions you're adding should have a
comment explaining what they do, what their arguments mean, etc.

I'm sort of suspicious of the fact that BeginCopyTo() is a shell
around BeginCopy() while BeginCopyFrom() does a whole bunch of other
stuff.  I haven't grokked what the code is doing here well enough to
have a concrete proposal though...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Extensions, patch v20 (bitrot fixes) (was: [HACKERS] Extensions, patch v19 (encoding brainfart fix))

2010-12-18 Thread Robert Haas
On Sat, Dec 18, 2010 at 2:39 PM, Dimitri Fontaine
 wrote:
> Here it is, attached. The problem was of course due to git branches and
> missing local pulling and merging. Going gradually from 7 to 2 branches
> causes that, sometimes.

I spent a little time looking at this tonight.  I'm going to give you
the same general advice that I've given other people who have
submitted very large patches of this type: it'll be a lot easier to
get this committed if we can break it into smaller pieces.  A pretty
easy thing to do would be to separate the changes that turn the
existing contrib modules into extensions into its own patch.  A
possibly more controversial idea is to actually remove the notion of a
relocatable extension from this patch, and all the supporting
machinery, and make that a follow-on patch.  I think it was arranged
like that before and somehow got collapsed, but maybe the notion is
worth revisiting, because this is a lot of code:

 224 files changed, 4713 insertions(+), 293 deletions(-)

That issue aside, I think there is still a fair amount of cleanup and
code review that needs to happen here before this can be considered
for commit.  Here are a few things I noticed on a first read-through:

- pg_execute_sql_string() and pg_execute_sql_file() are documented,
but are not actually part of the patch.

- The description of the NO USER DATA option is almost content-free.
It basically says "this option is here because it wouldn't work if we
didn't have this option".

- listDependentObjects() appears to be largely cut-and-pasted from
some other function.  It calls AcquireDeletionLock() and the comments
mention constructing a list of objects to delete.  It sure doesn't
seem right to be acquiring AccessExclusiveLocks on lots of things in a
function that's there to support the pg_extension_objects SRF.

- This bit certainly violates our message style guidelines:

+   elog(NOTICE,
+"Installing extension '%s' from '%s', in schema %s,
%s user data",
+stmt->extname,
+get_extension_absolute_path(control->script),
+schema ? schema : "public",
+create_extension_with_user_data ? "with" : "without");

User-facing messages need to be ereport, not elog, so they can be
translated, and mustn't be assembled from pieces, as you've done with
"%s user data".

- Has the issue of changing custom_variable_classes from PGC_SIGHUP to
PGC_SUSET been discussed?  I am not sure whether that's an OK thing to
do.  If it is OK, then the documentation also needs updating.

- This comment looks like leftovers:

+   /* FIXME: add PGC_EXTENSION so that we don't abuse PGC_SIGHUP here? */
+   SetConfigOption("custom_variable_classes",
+   newval, PGC_SIGHUP, PGC_S_EXTENSION);

Apologies if I missed the previous discussion of this, but why are we
adding a new GUC context?

- Did we decide to ditch the encoding parameter for extension scripts
and mandate UTF-8?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-18 Thread Tom Lane
Tomas Vondra  writes:
> I've done several small changes to the patch, namely

> - added docs for the functions (in SGML)
> - added the same thing for background writer

> So I think now it's 'complete' and I'll add it to the commit fest in a
> few minutes.

Please split this into separate patches for database-level and
table-level tracking, because I think it's highly likely that the latter
will get rejected.  We have had multiple complaints already about the
size of the stats file for databases with many tables.  I don't believe
that it's worth bloating the per-table entries even further with this
information.  Tracking reset time it per-database doesn't seem like a
problem though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2010-12-18 Thread Tomas Vondra
Dne 12.12.2010 03:47, Robert Haas napsal(a):
> On Sat, Dec 11, 2010 at 4:40 PM,   wrote:
>>> Hello
>>>
>>> you have to respect pg coding style:
>>>
>>> a) not too long lines
>>> b) not C++ line comments
>>
>> OK, thanks for the notice. I've fixed those two problems.
> 
> Please add this to the currently-open commitfest.
> 
> https://commitfest.postgresql.org/action/commitfest_view/open

I've done several small changes to the patch, namely

- added docs for the functions (in SGML)
- added the same thing for background writer

So I think now it's 'complete' and I'll add it to the commit fest in a
few minutes.

regards
Tomas
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5fd0213..40c6c6b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -600,6 +600,15 @@ postgres: user database 
host 
 
  
+  
pg_stat_get_db_last_stat_reset_time(oid)
+  timestamptz
+  
+   Time of the last reset of statistics for this database (as a result of 
executing
+  pg_stat_reset function)
+  
+ 
+
+ 
   
pg_stat_get_numscans(oid)
   bigint
   
@@ -725,6 +734,15 @@ postgres: user database 
host 
 
  
+  
pg_stat_get_last_stat_reset_time(oid)
+  timestamptz
+  
+   Time of the last reset of statistics on this table (as a result of 
executing
+  pg_stat_reset or 
pg_stat_reset_single_table_counters)
+  
+ 
+
+ 
   
pg_stat_get_vacuum_count(oid)
   bigint
   
@@ -879,6 +897,15 @@ postgres: user database 
host 
 
  
+  
pg_stat_get_function_last_stat_reset_time(oid)
+  timestamptz
+  
+   Time of the last reset of statistics for this function (as a result of 
executing
+  pg_stat_reset or 
pg_stat_reset_single_function_counters)
+  
+ 
+
+ 
   
pg_stat_get_xact_function_calls(oid)
   bigint
   
@@ -1064,6 +1091,15 @@ postgres: user database 
host 
 
  
+  
pg_stat_get_bgwriter_last_stat_reset_time()
+  timestamptz
+  
+   Time of the last reset of statistics for the background writer (as a 
result of executing
+  pg_stat_reset_shared('bgwriter'))
+  
+ 
+
+ 
   
pg_stat_get_buf_written_backend()
   bigint
   
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 346eaaf..3e1cca3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -310,6 +310,7 @@ CREATE VIEW pg_stat_all_tables AS
 pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
 pg_stat_get_last_analyze_time(C.oid) as last_analyze,
 pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
+pg_stat_get_last_stat_reset_time(C.oid) as last_stat_reset,
 pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
 pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
 pg_stat_get_analyze_count(C.oid) AS analyze_count,
@@ -502,7 +503,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
 pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
 pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
-pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted
+pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
+pg_stat_get_db_last_stat_reset_time(D.oid) AS last_stat_reset
 FROM pg_database D;
 
 CREATE VIEW pg_stat_user_functions AS
@@ -512,7 +514,8 @@ CREATE VIEW pg_stat_user_functions AS
 P.proname AS funcname,
 pg_stat_get_function_calls(P.oid) AS calls,
 pg_stat_get_function_time(P.oid) / 1000 AS total_time,
-pg_stat_get_function_self_time(P.oid) / 1000 AS self_time
+pg_stat_get_function_self_time(P.oid) / 1000 AS self_time,
+pg_stat_get_function_last_stat_reset_time(P.oid) AS last_stat_reset
 FROM pg_proc P LEFT JOIN pg_namespace N ON (N.oid = P.pronamespace)
 WHERE P.prolang != 12  -- fast check to eliminate built-in functions
   AND pg_stat_get_function_calls(P.oid) IS NOT NULL;
@@ -538,7 +541,8 @@ CREATE VIEW pg_stat_bgwriter AS
 pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
 pg_stat_get_buf_written_backend() AS buffers_backend,
 pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-pg_stat_get_buf_alloc() AS buffers_alloc;
+pg_stat_get_buf_alloc() AS buffers_alloc,
+pg_stat_get_bgwriter_last_stat_reset_time() AS last_stat_reset;
 
 CREATE VIEW pg_user_mappings AS
 SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c3c136a..1fd5e6a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -249,6 +249,8 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool cr

Re: [HACKERS] plperlu problem with utf8

2010-12-18 Thread David E. Wheeler
On Dec 17, 2010, at 10:46 PM, Alex Hunsaker wrote:

>> But that's a separate issue from the, erm, inconsistency with which PL/Perl 
>> treats encoding and decoding of its inputs and outputs.
> 
> Yay! So I think we can finally agree that for Oleg's original test
> case postgres was getting right.  I hope ? :)

Yes, and so was Perl.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> 
> On 12/18/2010 06:23 PM, Bruce Momjian wrote:
> >
> >> If you really think that pulling a port number out of the pid file is an
> >> improvement over what pg_ctl does now, then you need to start by storing
> >> the port number, as such, in the pid file.  Not something that might or
> >> might not be related to the port number.  But what we have to discuss
> >> before that is whether we mind having a significant postmaster version
> >> dependency in pg_ctl.
> > OK, good point on the version issue.  Let's see if we get more
> > complaints before changing this.  Thanks.
> >
> 
> Wasn't there a proposal to provide an explicit port parameter to pg_ctl, 
> instead of relying on PGPORT? That would probably be a small advance.

I do not remember that suggestion.

I wonder if we should write the port number as the 4th line in
postmaster.pid and return in a few major releases and use that.  We
could fall back and use our existing code if there is no 4th line.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Andrew Dunstan



On 12/18/2010 06:23 PM, Bruce Momjian wrote:



If you really think that pulling a port number out of the pid file is an
improvement over what pg_ctl does now, then you need to start by storing
the port number, as such, in the pid file.  Not something that might or
might not be related to the port number.  But what we have to discuss
before that is whether we mind having a significant postmaster version
dependency in pg_ctl.

OK, good point on the version issue.  Let's see if we get more
complaints before changing this.  Thanks.



Wasn't there a proposal to provide an explicit port parameter to pg_ctl, 
instead of relying on PGPORT? That would probably be a small advance.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> Bruce Momjian  writes:
> >>> pg_ctl already knows the data directory.  If the file is missing, the
> >>> server is not running.  If the file exists, the first number on the last
> >>> line, divided by 1000, is the port number.
> 
> >> That's somewhere between fragile and outright wrong.
> 
> > Please explain why my idea is not an improvement.
> 
> Because it's assuming that those numbers are sysv shmem keys derived in
> a particular way.  We have platforms on which that is wrong, Windows
> being the most obvious example.  Reading the shmem key assignment code
> closely will suggest to you other ways that this could fail.  Not to
> mention that people propose getting rid of sysv shmem approximately
> every other month, and perhaps someday that will actually happen;
> whereupon whatever might get logged in postmaster.pid could be something
> completely different.

Yeah, I was afraid of Windows.

> If you really think that pulling a port number out of the pid file is an
> improvement over what pg_ctl does now, then you need to start by storing
> the port number, as such, in the pid file.  Not something that might or
> might not be related to the port number.  But what we have to discuss
> before that is whether we mind having a significant postmaster version
> dependency in pg_ctl.

OK, good point on the version issue.  Let's see if we get more
complaints before changing this.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Tom Lane
Bruce Momjian  writes:
> Tom Lane wrote:
>> Bruce Momjian  writes:
>>> pg_ctl already knows the data directory.  If the file is missing, the
>>> server is not running.  If the file exists, the first number on the last
>>> line, divided by 1000, is the port number.

>> That's somewhere between fragile and outright wrong.

> Please explain why my idea is not an improvement.

Because it's assuming that those numbers are sysv shmem keys derived in
a particular way.  We have platforms on which that is wrong, Windows
being the most obvious example.  Reading the shmem key assignment code
closely will suggest to you other ways that this could fail.  Not to
mention that people propose getting rid of sysv shmem approximately
every other month, and perhaps someday that will actually happen;
whereupon whatever might get logged in postmaster.pid could be something
completely different.

If you really think that pulling a port number out of the pid file is an
improvement over what pg_ctl does now, then you need to start by storing
the port number, as such, in the pid file.  Not something that might or
might not be related to the port number.  But what we have to discuss
before that is whether we mind having a significant postmaster version
dependency in pg_ctl.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > pg_ctl.c::test_postmaster_connection() has some fragile code that tries
> > to detect the server port number by looking in the pg_ctl -o string,
> 
> It may be fragile, but it works; or at least I've not heard complaints
> about it lately.

True.

> > I think a simpler solution would be to look in postmaster.pid:
> > pg_ctl already knows the data directory.  If the file is missing, the
> > server is not running.  If the file exists, the first number on the last
> > line, divided by 1000, is the port number.
> 
> That's somewhere between fragile and outright wrong.

Please explain why my idea is not an improvement.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-18 Thread Tom Lane
Bruce Momjian  writes:
> pg_ctl.c::test_postmaster_connection() has some fragile code that tries
> to detect the server port number by looking in the pg_ctl -o string,

It may be fragile, but it works; or at least I've not heard complaints
about it lately.

> I think a simpler solution would be to look in postmaster.pid:
> pg_ctl already knows the data directory.  If the file is missing, the
> server is not running.  If the file exists, the first number on the last
> line, divided by 1000, is the port number.

That's somewhere between fragile and outright wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-18 Thread Robert Haas
On Sat, Dec 18, 2010 at 12:27 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>
>> If there's any third-party code out there that is checking
>> rd_istemp, it likely also needs to be revised to check whether
>> WAL-logging is needed, not whether the relation is temp. The way
>> I've coded it, such code will fail to compile, and can be very
>> easily fixed by substituting a call to RelationNeedsWAL() or
>> RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
>> depending on which property the caller actually cares about.
>
> Hmm...  This broke the SSI patch, which was using rd_istemp to omit
> conflict checking where it was set to true.  The property I care
> about is whether tuples in one backend can be read by an transaction
> in a different backend, which I assumed would not be true for
> temporary tables.  Which of the above would be appropriate for that
> use?

RelationUsesLocalBuffers().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exceptions not present in plpy with Python 3

2010-12-18 Thread Jan Urbański
On 18/12/10 18:56, Jan Urbański wrote:
> I'm not a Python 3 expert, but I nicked some code from the Internet and
> came up with this patch (passes regression tests on both Python 2 and 3).

I tried to be too cute with the regression test, it fails with Python
2.3.7 (the latest 2.3 release).

Attached is a fixed test that should work in Pythons 2.3+

I don't want to open that can of worms just yet, but do we have any
formal policy on the oldest Python version we support? I tested on 2.3
just because looking at http://python.org/download/releases/ I saw that
2.2 was last released in 2003, so I thought 2.3 was as far as I'd go...

Cheers,
Jan
*** src/pl/plpython/expected/plpython_test.out
--- src/pl/plpython/expected/plpython_test.out	2010-12-18 19:29:48.0 +0100
***
*** 35,40 
--- 35,53 
   willem doe => {fname: willem, lname: doe, userid: 3, username: w_doe}
  (3 rows)
  
+ -- check module contents
+ CREATE FUNCTION module_contents() RETURNS text AS
+ $$
+ contents = list(filter(lambda x: not x.startswith("__"), dir(plpy)))
+ contents.sort()
+ return ", ".join(contents)
+ $$ LANGUAGE plpythonu;
+ select module_contents();
+   module_contents  
+ ---
+  Error, Fatal, SPIError, debug, error, execute, fatal, info, log, notice, prepare, warning
+ (1 row)
+ 
  CREATE FUNCTION elog_test() RETURNS void
  AS $$
  plpy.debug('debug')
*** src/pl/plpython/plpython.c
--- src/pl/plpython/plpython.c	2010-12-18 19:21:35.0 +0100
***
*** 3204,3209 
--- 3204,3238 
  
  
  /*
+  * Exception initialization
+  */
+ 
+ static void
+ PLy_initialize_exceptions(PyObject *plpy)
+ {
+ #if PY_MAJOR_VERSION < 3
+ 	PyObject *plpy_dict = PyModule_GetDict(plpy);
+ #endif
+ 
+ 	PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
+ 	PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
+ 	PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
+ 
+ #if PY_MAJOR_VERSION >= 3
+ 	Py_INCREF(PLy_exc_error);
+ 	PyModule_AddObject(plpy, "Error", PLy_exc_error);
+ 	Py_INCREF(PLy_exc_fatal);
+ 	PyModule_AddObject(plpy, "Fatal", PLy_exc_fatal);
+ 	Py_INCREF(PLy_exc_spi_error);
+ 	PyModule_AddObject(plpy, "SPIError", PLy_exc_spi_error);
+ #else
+ 	PyDict_SetItemString(plpy_dict, "Error", PLy_exc_error);
+ 	PyDict_SetItemString(plpy_dict, "Fatal", PLy_exc_fatal);
+ 	PyDict_SetItemString(plpy_dict, "SPIError", PLy_exc_spi_error);
+ #endif
+ }
+ 
+ /*
   * language handler and interpreter initialization
   */
  
***
*** 3211,3217 
  static PyMODINIT_FUNC
  PyInit_plpy(void)
  {
! 	return PyModule_Create(&PLy_module);
  }
  #endif
  
--- 3240,3254 
  static PyMODINIT_FUNC
  PyInit_plpy(void)
  {
! 	PyObject	*m;
! 
! 	m = PyModule_Create(&PLy_module);
! 	if (m == NULL)
! 		return NULL;
! 
! 	PLy_initialize_exceptions(m);
! 
! 	return m;
  }
  #endif
  
***
*** 3290,3297 
  	PyObject   *main_mod,
  			   *main_dict,
  			   *plpy_mod;
! 	PyObject   *plpy,
! 			   *plpy_dict;
  
  	/*
  	 * initialize plpy module
--- 3327, 
  	PyObject   *main_mod,
  			   *main_dict,
  			   *plpy_mod;
! 	PyObject   *plpy;
  
  	/*
  	 * initialize plpy module
***
*** 3305,3322 
  	plpy = PyModule_Create(&PLy_module);
  #else
  	plpy = Py_InitModule("plpy", PLy_methods);
  #endif
- 	plpy_dict = PyModule_GetDict(plpy);
  
  	/* PyDict_SetItemString(plpy, "PlanType", (PyObject *) &PLy_PlanType); */
  
- 	PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
- 	PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
- 	PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
- 	PyDict_SetItemString(plpy_dict, "Error", PLy_exc_error);
- 	PyDict_SetItemString(plpy_dict, "Fatal", PLy_exc_fatal);
- 	PyDict_SetItemString(plpy_dict, "SPIError", PLy_exc_spi_error);
- 
  	/*
  	 * initialize main module, and add plpy
  	 */
--- 3341,3352 
  	plpy = PyModule_Create(&PLy_module);
  #else
  	plpy = Py_InitModule("plpy", PLy_methods);
+ 	/* for Python 3 we initialized the exceptions in PyInit_plpy */
+ 	PLy_initialize_exceptions(plpy);
  #endif
  
  	/* PyDict_SetItemString(plpy, "PlanType", (PyObject *) &PLy_PlanType); */
  
  	/*
  	 * initialize main module, and add plpy
  	 */
*** src/pl/plpython/sql/plpython_test.sql
--- src/pl/plpython/sql/plpython_test.sql	2010-12-18 19:27:58.0 +0100
***
*** 25,30 
--- 25,39 
  
  select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1;
  
+ -- check module contents
+ CREATE FUNCTION module_contents() RETURNS text AS
+ $$
+ contents = list(filter(lambda x: not x.startswith("__"), dir(plpy)))
+ contents.sort()
+ return ", ".join(contents)
+ $$ LANGUAGE plpythonu;
+ 
+ select module_contents();
  
  CREATE FUNCTION elog_test() RETURNS void
  AS $$

-- 
Sent via 

[HACKERS] pg_ctl and port number detection

2010-12-18 Thread Bruce Momjian
pg_ctl.c::test_postmaster_connection() has some fragile code that tries
to detect the server port number by looking in the pg_ctl -o string,
postgresql.conf, the PGPORT environment variable, and finally using the
default port number.

I think a simpler solution would be to look in postmaster.pid:

10231
/u/pgsql/data
  5432001  45481984

pg_ctl already knows the data directory.  If the file is missing, the
server is not running.  If the file exists, the first number on the last
line, divided by 1000, is the port number.  We can then use this port
number for libpq to check for connectivity.

Comments?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] exceptions not present in plpy with Python 3

2010-12-18 Thread Jan Urbański
Hi,

there seems to be a problem in the way we add exceptions to the plpy
module in PL/Python compiled with Python 3k.

Try this: DO $$ plpy.SPIError $$ language plpython3u;

I'm not a Python 3 expert, but I nicked some code from the Internet and
came up with this patch (passes regression tests on both Python 2 and 3).

The funny thing is that it never blew up earlier, because we only used
plpy.SPIError in except: blocks that weren't even executed, as errors in
plpy.execute just terminate the function.

With my changes they turn into catchable exceptions, and so accessing
plpy.SPIError in Python 3 becomes essential.

BTW: do we have any buildfarm animal that uses Python 3?

Cheers,
Jan
*** src/pl/plpython/expected/plpython_test.out
--- src/pl/plpython/expected/plpython_test.out	2010-12-18 18:46:17.0 +0100
***
*** 35,40 
--- 35,53 
   willem doe => {fname: willem, lname: doe, userid: 3, username: w_doe}
  (3 rows)
  
+ -- check module contents
+ CREATE FUNCTION module_contents() RETURNS text AS
+ $$
+ contents = dir(plpy)
+ contents.sort()
+ return ", ".join(contents)
+ $$ LANGUAGE plpythonu;
+ select module_contents();
+   module_contents  
+ ---
+  Error, Fatal, SPIError, __doc__, __name__, __package__, debug, error, execute, fatal, info, log, notice, prepare, warning
+ (1 row)
+ 
  CREATE FUNCTION elog_test() RETURNS void
  AS $$
  plpy.debug('debug')
*** src/pl/plpython/plpython.c
--- src/pl/plpython/plpython.c	2010-12-18 18:39:00.0 +0100
***
*** 3204,3209 
--- 3204,3238 
  
  
  /*
+  * Exception initialization
+  */
+ 
+ static void
+ PLy_initialize_exceptions(PyObject *plpy)
+ {
+ #if PY_MAJOR_VERSION < 3
+ 	PyObject *plpy_dict = PyModule_GetDict(plpy);
+ #endif
+ 
+ 	PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
+ 	PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
+ 	PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
+ 
+ #if PY_MAJOR_VERSION >= 3
+ 	Py_INCREF(PLy_exc_error);
+ 	PyModule_AddObject(plpy, "Error", PLy_exc_error);
+ 	Py_INCREF(PLy_exc_fatal);
+ 	PyModule_AddObject(plpy, "Fatal", PLy_exc_fatal);
+ 	Py_INCREF(PLy_exc_spi_error);
+ 	PyModule_AddObject(plpy, "SPIError", PLy_exc_spi_error);
+ #else
+ 	PyDict_SetItemString(plpy_dict, "Error", PLy_exc_error);
+ 	PyDict_SetItemString(plpy_dict, "Fatal", PLy_exc_fatal);
+ 	PyDict_SetItemString(plpy_dict, "SPIError", PLy_exc_spi_error);
+ #endif
+ }
+ 
+ /*
   * language handler and interpreter initialization
   */
  
***
*** 3211,3217 
  static PyMODINIT_FUNC
  PyInit_plpy(void)
  {
! 	return PyModule_Create(&PLy_module);
  }
  #endif
  
--- 3240,3254 
  static PyMODINIT_FUNC
  PyInit_plpy(void)
  {
! 	PyObject	*m;
! 
! 	m = PyModule_Create(&PLy_module);
! 	if (m == NULL)
! 		return NULL;
! 
! 	PLy_initialize_exceptions(m);
! 
! 	return m;
  }
  #endif
  
***
*** 3290,3297 
  	PyObject   *main_mod,
  			   *main_dict,
  			   *plpy_mod;
! 	PyObject   *plpy,
! 			   *plpy_dict;
  
  	/*
  	 * initialize plpy module
--- 3327, 
  	PyObject   *main_mod,
  			   *main_dict,
  			   *plpy_mod;
! 	PyObject   *plpy;
  
  	/*
  	 * initialize plpy module
***
*** 3305,3322 
  	plpy = PyModule_Create(&PLy_module);
  #else
  	plpy = Py_InitModule("plpy", PLy_methods);
  #endif
- 	plpy_dict = PyModule_GetDict(plpy);
  
  	/* PyDict_SetItemString(plpy, "PlanType", (PyObject *) &PLy_PlanType); */
  
- 	PLy_exc_error = PyErr_NewException("plpy.Error", NULL, NULL);
- 	PLy_exc_fatal = PyErr_NewException("plpy.Fatal", NULL, NULL);
- 	PLy_exc_spi_error = PyErr_NewException("plpy.SPIError", NULL, NULL);
- 	PyDict_SetItemString(plpy_dict, "Error", PLy_exc_error);
- 	PyDict_SetItemString(plpy_dict, "Fatal", PLy_exc_fatal);
- 	PyDict_SetItemString(plpy_dict, "SPIError", PLy_exc_spi_error);
- 
  	/*
  	 * initialize main module, and add plpy
  	 */
--- 3341,3352 
  	plpy = PyModule_Create(&PLy_module);
  #else
  	plpy = Py_InitModule("plpy", PLy_methods);
+ 	/* for Python 3 we initialized the exceptions in PyInit_plpy */
+ 	PLy_initialize_exceptions(plpy);
  #endif
  
  	/* PyDict_SetItemString(plpy, "PlanType", (PyObject *) &PLy_PlanType); */
  
  	/*
  	 * initialize main module, and add plpy
  	 */
*** src/pl/plpython/sql/plpython_test.sql
--- src/pl/plpython/sql/plpython_test.sql	2010-12-18 18:44:19.0 +0100
***
*** 25,30 
--- 25,39 
  
  select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1;
  
+ -- check module contents
+ CREATE FUNCTION module_contents() RETURNS text AS
+ $$
+ contents = dir(plpy)
+ contents.sort()
+ return ", ".join(contents)
+ $$ LANGUAGE plpythonu;
+ 
+ select module_contents();
  
  CREATE FUNCTION 

Re: [HACKERS] Extensions, patch v19 (encoding brainfart fix)

2010-12-18 Thread Dimitri Fontaine
Itagaki Takahiro  writes:
> You probably compared wrong versions of source trees. The patch contains
> many diffs not related to EXTENSION. It cannot be applied cleanly.

Ouch, sorry about that, will revise soon'ish.  Meanwhile the git repo is
still available here:

  
http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension

> BTW, only earthdistance.sql.in has @extsch...@.
> Didn't you forget to remove it?

No. It so happens that earthdistance depends on cube and that we agreed
on not implement dependencies between extensions for the first patch.
The problem here is that ALTER EXTENSION earthdistance SET SCHEMA foo;
would relocate cube objects too, because there's nothing to stop the
recursive walking at the right spot.  So earthdistance.control states
that earthdistance is not relocatable for now.

And a non relocatable extension will use @extschema@ so that users still
get a say in where to install it…

> I've not read the patch well yet, but I'd like to make sure of
> two specs in the patch:
>
> #1. I found the patch specifies "version" in each control file. We will
> need to maintain them manually, but I assume it was a conclusion in the
> discussion for v18 patch. So, no more changes are required here.

Exactly, the consensus seems to be that we *want* to maintain separate
versions number for each contrib.

> #2. The patch replaces "\i XXX.sql" to "CREATE EXTENSION XXX". They are a
> bit different because CREATE EXTENSION uses the installed sql files instead
> of the source directory. But I think this is the correct behavior. We should
> have used only installed files because they are used in "make *installcheck*".

Yeah.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-18 Thread Kevin Grittner
Robert Haas  wrote:
 
> If there's any third-party code out there that is checking
> rd_istemp, it likely also needs to be revised to check whether
> WAL-logging is needed, not whether the relation is temp. The way
> I've coded it, such code will fail to compile, and can be very
> easily fixed by substituting a call to RelationNeedsWAL() or
> RelationUsesLocalBuffers() or RelationUsesTempNamespace(),
> depending on which property the caller actually cares about.
 
Hmm...  This broke the SSI patch, which was using rd_istemp to omit
conflict checking where it was set to true.  The property I care
about is whether tuples in one backend can be read by an transaction
in a different backend, which I assumed would not be true for
temporary tables.  Which of the above would be appropriate for that
use?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-18 Thread Tomas Vondra
Dne 18.12.2010 16:59, Florian Pflug napsal(a):
> On Dec18, 2010, at 08:10 , t...@fuzzy.cz wrote:
> 
>>> On Dec17, 2010, at 23:12 , Tomas Vondra wrote:
 Well, not really - I haven't done any experiments with it. For two
 columns selectivity equation is

 (dist(A) * sel(A) + dist(B) * sel(B)) / (2 * dist(A,B))

 where A and B are columns, dist(X) is number of distinct values in
 column X and sel(X) is selectivity of column X.
>>>
>>> Huh? This is the selectivity estimate for "A = x AND B = y"? Surely,
>>> if A and B are independent, the formula must reduce to sel(A) * sel(B),
>>> and I cannot see how that'd work with the formula above.
>>
>> Yes, it's a selectivity estimate for P(A=a and B=b). It's based on
>> conditional probability, as
>>
>>   P(A=a and B=b) = P(A=a|B=b)*P(B=b) = P(B=b|A=a)*P(A=a)
>>
>> and "uniform correlation" assumption so that it's possible to replace the
>> conditional probabilities with constants. And those constants are then
>> estimated as dist(A)/dist(A,B) or dist(B)/dist(A,B).
> 
> Ok, I think I understand this now. The selectivity equation actually
> *does* reduce to sel(A) * sel(B), *if* we pick a very simple estimate
> for sel(A).
> 
> Take the clause "A = x AND B = y" for example. Without knowing anything
> about x and y, reasonable guesses for sel(A=x) and sel(B=y) are
> 
>   sel(A=x) = 1 / dist(A)
>   sel(B=y) = 1 / dist(B).
> 
> This is also what we do currently, according to var_eq_non_const() in
> src/backend/utils/adt/selfuncs.c, if we don't have any additional
> knowledge about x (Actually, we also factor the probability of A being
> NULL into this).
> 
> With these estimates, your formula becomes
> 
>   sel(A=x,B=y) = 1 / dist(A,B).
> 
> and if A and B are uncorrelated, dist(A,B) ~= dist(A) * dist(B), thus
> 
>   sel(A=x,B=y) = sel(A=x) * sel(B=y).
> 
> If, however, y is a constant, then we use the MKVs to estimate sel(B=y)
> (var_eq_const() in src/backend/utils/adt/selfuncs.c). If
> 
>   sel(B=y) ~= 0,
> 
> we'd currently also conclude that
> 
>   sel(A=x,B=y) ~= 0.
> 
> With the "uniform correlation" approach, we'd instead estimate
> 
>   sel(A=x,B=y) ~= sel(A=x) / dist(B)
> 
> assuming that dist(A,B) ~= dist(A)*dist(B), meaning A,B are uncorrelated.
> If dist(B) is small, this estimate is much worse than what we'd currently
> get, since we've effectively ignored the information that the restriction
> B=y alone guarantees that only very few rows will match.

Well, I guess you're right. Let's see two examples - uncorrelated and
higly correlated data (and then a few comments on what I think you're
missing).

uncorrelated

Let there be 10 distinct values in A, 100 distinct values in B, and
there are all possible combinations (10x100 = 1000). All the values (and
combinations) are uniformly distributed.

The current estimate is correct, i.e.

  sel(A=a) = 1/10 = 1/dist(A)
  sel(B=b) = 1/100 = 1/dist(B)
  sel(A=a, B=b) = sel(A) * sel(B) = 1/1000  /* due to AVI */

the proposed estimate is

  sel(A=a, B=b) = (dist(A)*sel(A=a) + dist(B)*sel(B=b)) / (2*dist(A,B))
= (10 * 1/10 + 100 * 1/100) / 2000 = 1/1000

so actually it produces the same estimate, but thats due to the uniform
distribution assumption.

Let's say the selectivities for A and B are very different - A is 10x
les common, B is 10x more common (let's say we know this). Then the
current estimate is still correct, i.e. it gives

  sel(A=a) = 1/100
  sel(B=b) = 1/10

  => sel(A=a,B=b) = 1/1000

but the new estimate is

  (10 * 1/100 + 100 * 1/10) / 2000 = (101/10) / 2000 ~ 1/200

So yes, in case of uncorrelated data this overestimates the result.


highly correlated
-
Again, let dist(A)=10, dist(B)=100. But this case let B=>A i.e. for each
value in B, there is a unique value in A. Say B=mod(A,10) or something
like that. So now there is dist(A,B)=100.

Assuming uniform distribution, the correct estimate is

  sel(A=a,B=b) = sel(B=b) = 1/100

and the current estimate is

  sel(A=a,B=b) = sel(A=a) * sel(B=b) = 1/10 * 1/100 = 1/1000

The proposed estimate is

  (10 * 1/10 + 100 * 1/100) / 200 = 2/200 = 1/100

which is correct.

Without the uniform distribution (again, sel(A)=1/100, sel(B)=1/10), we
currently get (just as before)

  sel(A=a,B=b) = 1/10 * 1/100 = 1/1000

which is 100x less than the correct value (sel(B)=1/10). The new
estimate yields

  (10*1/100 + 100*1/10) / 200 = (1010/100) / 200 ~ 1/20

which is not as accurate as with uniform distribution, but it's
considerably more accurate than the current estimate (1/1000).


comments

I really don't think this a perfect solution giving 100% accurate
results in all cases. But the examples above illustrate that it gives
much better estimates in case of correlated data.

It seems to me you're missing one very important thing - this was not
meant as a new default way to do estimates. It was meant as an option
when the user (DBA, developer, ...) realizes the current solution gives
reall

Re: [HACKERS] Crash on attempt to connect to nonstarted server

2010-12-18 Thread Bruce Momjian
bruce wrote:
> Magnus Hagander wrote:
> > I get a crash on win32 when connecting to a server that's not started.
> > In fe-connect.c, we have:
> > 
> > display_host_addr = (conn->pghostaddr == NULL) &&
> > (strcmp(conn->pghost, host_addr) != 0);
> > 
> > In my case, conn->pghost is NULL at this point, as is
> > conn->pghostaddr. Thus, it crashes in strcmp().
> 
> I have researched this with Magnus, and was able to reproduce the
> failure.  It happens only on Win32 because that is missing unix-domain
> sockets so "" maps to localhost, which is an IP address.  I have applied
> the attached patch.  The new output is:
> 
>   $ psql test
>   psql: could not connect to server: Connection refused
> Is the server running on host "???" and accepting
> TCP/IP connections on port 5432?
> 
> Note the "???".  This happens because the mapping of "" to localhost
> happens below the libpq library variable level.

I made a mistake in the fix from yesterday.  I added code to test for
NULL, but in fact what I should have done was to allow a NULL hostname
to trigger the printing of the IP address because it will never match an
IP number.

The attached applied, patch fixes this, and uses the same logic as in
connectDBStart() to print 'localhost' for connection failures.  I also
added code to use DefaultHost consistently in that file.  With the patch
the new Win32 output for a down server is:

$ pql test
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and 
accepting
TCP/IP connections on port 5432?

$ psql -h "" test
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and 
accepting
TCP/IP connections on port 5432?

I am amazed we were printing "???" all these years on this very popular
platform.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 8d9400b..bf8beb7 100644
*** /tmp/rgI5Ne_fe-connect.c	Sat Dec 18 11:21:07 2010
--- src/interfaces/libpq/fe-connect.c	Sat Dec 18 11:10:09 2010
*** connectFailureMessage(PGconn *conn, int 
*** 1030,1049 
  		else
  			strcpy(host_addr, "???");
  
  		display_host_addr = (conn->pghostaddr == NULL) &&
! 			(conn->pghost != NULL) &&
! 			(strcmp(conn->pghost, host_addr) != 0);
  
  		appendPQExpBuffer(&conn->errorMessage,
  		  libpq_gettext("could not connect to server: %s\n"
  	 "\tIs the server running on host \"%s\"%s%s%s and accepting\n"
  		"\tTCP/IP connections on port %s?\n"),
  		  SOCK_STRERROR(errorno, sebuf, sizeof(sebuf)),
! 		  conn->pghostaddr
  		  ? conn->pghostaddr
! 		  : (conn->pghost
  			 ? conn->pghost
! 			 : "???"),
  		  /* display the IP address only if not already output */
  		  display_host_addr ? " (" : "",
  		  display_host_addr ? host_addr : "",
--- 1030,1054 
  		else
  			strcpy(host_addr, "???");
  
+ 		/*
+ 		 *	If the user did not supply an IP address using 'hostaddr', and
+ 		 *	'host' was missing or does not match our lookup, display the
+ 		 *	looked-up IP address.
+ 		 */
  		display_host_addr = (conn->pghostaddr == NULL) &&
! 			((conn->pghost == NULL) ||
! 			 (strcmp(conn->pghost, host_addr) != 0));
  
  		appendPQExpBuffer(&conn->errorMessage,
  		  libpq_gettext("could not connect to server: %s\n"
  	 "\tIs the server running on host \"%s\"%s%s%s and accepting\n"
  		"\tTCP/IP connections on port %s?\n"),
  		  SOCK_STRERROR(errorno, sebuf, sizeof(sebuf)),
! 		  (conn->pghostaddr && conn->pghostaddr[0] != '\0')
  		  ? conn->pghostaddr
! 		  : (conn->pghost && conn->pghost[0] != '\0')
  			 ? conn->pghost
! 			 : DefaultHost,
  		  /* display the IP address only if not already output */
  		  display_host_addr ? " (" : "",
  		  display_host_addr ? host_addr : "",
*** connectDBStart(PGconn *conn)
*** 1304,1310 
  		UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket);
  #else
  		/* Without Unix sockets, default to localhost instead */
! 		node = "localhost";
  		hint.ai_family = AF_UNSPEC;
  #endif   /* HAVE_UNIX_SOCKETS */
  	}
--- 1309,1315 
  		UNIXSOCK_PATH(portstr, portnum, conn->pgunixsocket);
  #else
  		/* Without Unix sockets, default to localhost instead */
! 		node = DefaultHost;
  		hint.ai_family = AF_UNSPEC;
  #endif   /* HAVE_UNIX_SOCKETS */
  	}
*** ldapServiceLookup(const char *purl, PQco
*** 3388,3394 
  	/* hostname */
  	hostname = url + strlen(LDAP_URL);
  	if (*hostname == '/')		/* no hostname? */
! 		hostname = "localhost"; /* the default */
  
  	/* dn, "distinguished name" */
 

Re: [HACKERS] proposal : cross-column stats

2010-12-18 Thread Florian Pflug
On Dec18, 2010, at 08:10 , t...@fuzzy.cz wrote:

>> On Dec17, 2010, at 23:12 , Tomas Vondra wrote:
>>> Well, not really - I haven't done any experiments with it. For two
>>> columns selectivity equation is
>>> 
>>> (dist(A) * sel(A) + dist(B) * sel(B)) / (2 * dist(A,B))
>>> 
>>> where A and B are columns, dist(X) is number of distinct values in
>>> column X and sel(X) is selectivity of column X.
>> 
>> Huh? This is the selectivity estimate for "A = x AND B = y"? Surely,
>> if A and B are independent, the formula must reduce to sel(A) * sel(B),
>> and I cannot see how that'd work with the formula above.
> 
> Yes, it's a selectivity estimate for P(A=a and B=b). It's based on
> conditional probability, as
> 
>   P(A=a and B=b) = P(A=a|B=b)*P(B=b) = P(B=b|A=a)*P(A=a)
> 
> and "uniform correlation" assumption so that it's possible to replace the
> conditional probabilities with constants. And those constants are then
> estimated as dist(A)/dist(A,B) or dist(B)/dist(A,B).

Ok, I think I understand this now. The selectivity equation actually
*does* reduce to sel(A) * sel(B), *if* we pick a very simple estimate
for sel(A).

Take the clause "A = x AND B = y" for example. Without knowing anything
about x and y, reasonable guesses for sel(A=x) and sel(B=y) are

  sel(A=x) = 1 / dist(A)
  sel(B=y) = 1 / dist(B).

This is also what we do currently, according to var_eq_non_const() in
src/backend/utils/adt/selfuncs.c, if we don't have any additional
knowledge about x (Actually, we also factor the probability of A being
NULL into this).

With these estimates, your formula becomes

  sel(A=x,B=y) = 1 / dist(A,B).

and if A and B are uncorrelated, dist(A,B) ~= dist(A) * dist(B), thus

  sel(A=x,B=y) = sel(A=x) * sel(B=y).

If, however, y is a constant, then we use the MKVs to estimate sel(B=y)
(var_eq_const() in src/backend/utils/adt/selfuncs.c). If

  sel(B=y) ~= 0,

we'd currently also conclude that

  sel(A=x,B=y) ~= 0.

With the "uniform correlation" approach, we'd instead estimate

  sel(A=x,B=y) ~= sel(A=x) / dist(B)

assuming that dist(A,B) ~= dist(A)*dist(B), meaning A,B are uncorrelated.
If dist(B) is small, this estimate is much worse than what we'd currently
get, since we've effectively ignored the information that the restriction
B=y alone guarantees that only very few rows will match.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers