Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread NISHIYAMA Tomoaki

I don't believe it is valid to ignore CJK characters above U+2.
If it is used for names, it will be stored in the database.
If the behaviour is different from characters below U+, you will
get a bug report in meanwhile.

see
CJK Extension B, C, and D
from
http://www.unicode.org/charts/

Also, there are some code points that could be regarded alphabet and numbers
http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols

On the other hand, it is ok if processing of characters above U+1 is very 
slow, 
as far as properly processed, because it is considered rare.


On 2012/02/17, at 23:56, Andrew Dunstan wrote:

 
 
 On 02/17/2012 09:39 AM, Tom Lane wrote:
 Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:
 Here's a wild idea: keep the class of each codepoint in a hash table.
 Initialize it with all codepoints up to 0x. After that, whenever a
 string contains a character that's not in the hash table yet, query the
 class of that character, and add it to the hash table. Then recompile
 the whole regex and restart the matching engine.
 Recompiling is expensive, but if you cache the results for the session,
 it would probably be acceptable.
 Dunno ... recompiling is so expensive that I can't see this being a win;
 not to mention that it would require fundamental surgery on the regex
 code.
 
 In the Tcl implementation, no codepoints above U+ have any locale
 properties (alpha/digit/punct/etc), period.  Personally I'd not have a
 problem imposing the same limitation, so that dealing with stuff above
 that range isn't really a consideration anyway.
 
 
 up to U+ is the BMP which is described as containing characters for 
 almost all modern languages, and a large number of special characters. It 
 seems very likely to be acceptable not to bother about the locale of code 
 points in the supplementary planes.
 
 See http://en.wikipedia.org/wiki/Plane_%28Unicode%29 for descriptions of 
 which sets of characters are involved.
 
 
 cheers
 
 andrew
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_regress application_name

2012-02-18 Thread Peter Eisentraut
I figured it would be good if pg_regress reported its application_name
as pg_regress rather than psql.  Any objections to the attached
patch?

diff --git i/src/test/regress/pg_regress.c w/src/test/regress/pg_regress.c
index 2f6b37b..1384223 100644
--- i/src/test/regress/pg_regress.c
+++ w/src/test/regress/pg_regress.c
@@ -691,6 +691,8 @@ initialize_environment(void)
 {
 	char	   *tmp;
 
+	putenv(PGAPPNAME=pg_regress);
+
 	if (nolocale)
 	{
 		/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_regress application_name

2012-02-18 Thread Magnus Hagander
On Sat, Feb 18, 2012 at 11:47, Peter Eisentraut pete...@gmx.net wrote:
 I figured it would be good if pg_regress reported its application_name
 as pg_regress rather than psql.  Any objections to the attached
 patch?

Sounds like a good idea to me, +1.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] mul_size() overflow check broken in win64 builds?

2012-02-18 Thread Tom Lane
Can anybody with a win64 build reproduce the misbehavior reported in bug
#6460?  I'm not currently interested in the question of whether my_log2
ought to be changed --- the question rather is why is the system not
noticing that his shared_buffers value overflows size_t.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 The time I got bitten by this was actually with LPAD(), rather than LIKE.

+1. This is one of the functions that gave some of our clients 
real trouble when 8.3 came out.

 If we really believed that implicit casts any form were evil, we 
 would have removed them entirely instead of trimming them back. 
 I don't see why it's heretical to suggest that the 8.3 casting 
 changes brought us to exactly that point in the universe where 
 everything is perfect and nothing can be further improved; does 
 anyone seriously believe that?

Agreed (although the last bit is a bit of a straw man). The idea 
in this thread of putting some implicit casts into an extension 
or other external package is not a very good one, either. Let's 
apply some common sense instead, and stick to our guns on the ones 
where we feel there could honestly be serious app consequences and 
thus we encourage^H^Hforce people to change their code (or write all 
sorts of custom casts and functions). I think the actual number of 
such app circumstances is rather small, but my clients are not your* 
clients, so who knows? In other words, I'll concede int==text, but 
really need a strong argument for conceding things like LPAD.

* Your = everyone else, not just M. Haas.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201202181145
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk8/1usACgkQvJuQZxSWSsjE6ACdHy31jpHUsXo5juvXcCkzKpGH
RQAAoM/uTbM/JBkDiDjrsI1Blyg3DsWf
=7CA4
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_restore ignores PGDATABASE

2012-02-18 Thread Erik Rijkers
pg_restore ignores environment variable PGDATABASE.

Is this intentional?  (perhaps because of the risk of restoring into the wrong 
db?)

I would prefer if it would honor the PGDATABASE variable, but if it does ignore 
it intentionally,
the following (from 9.2devel docs) is obviously incorrect:

This utility, like most other PostgreSQL utilities, also uses the environment 
variables supported
by libpq (see Section 31.13).

I could look into fixing one (binary) or the other (docs), but what /is/ the 
preferred behavior?


thanks,


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Tom Lane
NISHIYAMA Tomoaki tomoa...@staff.kanazawa-u.ac.jp writes:
 I don't believe it is valid to ignore CJK characters above U+2.
 If it is used for names, it will be stored in the database.
 If the behaviour is different from characters below U+, you will
 get a bug report in meanwhile.

I am skeptical that there is enough usage of such things to justify
slowing regexp operations down for everybody.  Note that it's not only
the initial probe of libc behavior that's at stake here --- the more
character codes are treated as letters, the larger the DFA transition
maps get and the more time it takes to build them.  So I'm unexcited
about just cranking up the loop limit in pg_ctype_get_cache.

 On the other hand, it is ok if processing of characters above U+1
 is very slow, as far as properly processed, because it is considered
 rare.

Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to iswalpha() and friends, rather than being included in the
statically-constructed DFA maps.  The cutoff point could likely be a lot
less than U+, too, thereby saving storage and map build time all
round.

However, that we above is the editorial we.  *I* am not going to
do this.  Somebody who actually has a need for it should step up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-18 Thread Jeff Janes
On Fri, Feb 17, 2012 at 7:36 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 17.02.2012 07:27, Fujii Masao wrote:

 Got another problem: when I ran pg_stop_backup to take an online backup,
 it got stuck until I had generated new WAL record. This happens because,
 in the patch, when pg_stop_backup forces a switch to new WAL file, old
 WAL file is not marked as archivable until next new WAL record has been
 inserted, but pg_stop_backup keeps waiting for that WAL file to be
 archived.
 OTOH, without the patch, WAL file is marked as archivable as soon as WAL
 file switch occurs.

 So, in short, the patch seems to handle the WAL file switch logic
 incorrectly.


 Yep. For a WAL-switch record, XLogInsert returns the location of the end of
 the record, not the end of the empty padding space. So when the caller
 flushed up to that point, it didn't flush the empty space and therefore
 didn't notify the archiver.

 Attached is a new version, fixing that, and off-by-one bug you pointed out
 in the slot wraparound handling. I also moved code around a bit, I think
 this new division of labor between the XLogInsert subroutines is more
 readable.

 Thanks for the testing!

Hi Heikki,

Sorry for the week long radio silence, I haven't been able to find
much time during the week.  I'll try to extract my test case from it's
quite messy testing harness and get a self-contained version, but it
will probably take a week or two to do it.  I can probably refactor it
to rely just on Perl and the modules DBI, DBD::Pg, IO::Pipe and
Storable.  Some of those are not core Perl modules, but they are all
common ones.  Would that be a good option?

I've tested your v9 patch.  I no longer see any inconsistencies or
lost transactions in the recovered database.  But occasionally I get
databases that fail to recover at all.
It has always been with the exact same failed assertion, at xlog.c line 2154.

I've only seen this 4 times out of 2202 cycles of crash and recovery,
so it must be some rather obscure situation.

LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/180001B0
LOG:  unexpected pageaddr 0/15084000 in log file 0, segment 25, offset 540672
LOG:  redo done at 0/19083FD0
LOG:  last completed transaction was at log time 2012-02-17 11:13:50.369488-08
LOG:  checkpoint starting: end-of-recovery immediate
TRAP: FailedAssertion(!(((uint64) (NewPageEndPtr).xlogid *
(uint64) (((uint32) 0x) / ((uint32) (16 * 1024 * 1024))) *
((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192)
% (XLogCtl-XLogCacheBlck + 1)) == nextidx), File: xlog.c, Line:
2154)
LOG:  startup process (PID 5390) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
As those who've been paying attention to it know, our regular expression
library is based on code originally developed by Henry Spencer and
contributed by him to the Tcl project.  We adopted it out of Tcl in
2003.  Henry intended to package the code as a standalone library as
well, but that never happened --- AFAICT, Henry dropped off the net
around 2002, and I have no idea what happened to him.

Since then, we've been acting as though the Tcl guys are upstream
maintainers for the regex code, but in point of fact there does not
appear to be anybody there with more than the first clue about that
code.  This was brought home to me a few days ago when I started talking
to them about possible ways to fix the quantified-backrefs problem that
depesz recently complained of (which turns out to have been an open bug
in their tracker since 2005).  As soon as I betrayed any indication of
knowing the difference between a DFA and an NFA, they offered me commit
privileges :-(.  And they haven't fixed any other significant bugs in
the engine in years, either.

So I think it's time to face facts and accept that Tcl are not a useful
upstream for the regex code.  And we can't just let it sit quietly,
because we have bugs to fix (at least the one) as well as enhancement
requests such as the nearby discussion about recognizing high Unicode
code points as letters.

A radical response to this would be to drop the Spencer regex engine and
use something else instead --- probably PCRE, because there are not all
that many alternatives out there.  I do not care much for this idea
though.  It would be a significant amount of work in itself, and there's
no real guarantee that PCRE will continue to be maintained either, and
there would be some user-visible compatibility issues because the regex
flavor is a bit different.  A larger point is that it'd be a real shame
for the Spencer regex engine to die off, because it is in fact one of
the best pieces of regex technology on the planet.  See Jeffrey Friedl's
Mastering Regular Expressions (O'Reilly) --- at least, that's what he
thought in the 2002 edition I have, and it's unlikely that things have
changed much.

So I'm feeling that we gotta suck it up and start acting like we are
the lead maintainers for this code, not just consumers.

Another possible long-term answer is to finish the work Henry never did,
that is make the code into a standalone library.  That would make it
available to more projects and perhaps attract other people to help
maintain it.  However, that looks like a lot of work too, with distant
and uncertain payoff.

Comments, other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 So I'm feeling that we gotta suck it up and start acting like we are
 the lead maintainers for this code, not just consumers.

By we, I take it you mean you personally?

There are many requests I might make for allocations of your time and
that wouldn't even be a lower priority item on such a list. Of course,
your time allocation is not my affair, so please take my words as a
suggestion and a compliment.

Do we have volunteers that might save Tom from taking on this task?
It's not something that requires too much knowledge and experience of
PostgreSQL, so is an easier task for a newcomer.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
 On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  So I'm feeling that we gotta suck it up and start acting like we are
  the lead maintainers for this code, not just consumers.
 
 By we, I take it you mean you personally?

I'm pretty sure he meant the PG project, and I'd agree with him- we're
going to have to do it if no one else is.  I suspect the Tcl folks will
be happy to look at incorporating anything we fix, if they can, but it
doesn't sound like they'll be able to help with fixing things much.

 Do we have volunteers that might save Tom from taking on this task?
 It's not something that requires too much knowledge and experience of
 PostgreSQL, so is an easier task for a newcomer.

Sure, it doesn't require knowledge of PG, but I dare say there aren't
very many newcomers who are going to walk in knowing how to manage
complex regex code..  I haven't seen too many who can update gram.y,
much less make our regex code handle Unicode better.  I'm all for
getting other people to help with the code, of course, but I wouldn't
hold my breath and leave existing bugs open on the hopes that someone's
gonna show up.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-18 Thread Tom Lane
Dave Malcolm at Red Hat has been working on a static code analysis tool
for Python-related C code.  He reports here on some preliminary results
for plpython.c:
https://bugzilla.redhat.com/show_bug.cgi?id=795011

I'm not enough of a Python hacker to evaluate the significance of these
issues, but somebody who does work on that code ought to take a look and
see what we ought to patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-18 Thread Robert Haas
On Tue, Feb 14, 2012 at 3:25 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 02/14/2012 01:45 PM, Greg Smith wrote:

 scale=1000, db is 94% of RAM; clients=4
 Version TPS
 9.0  535
 9.1  491 (-8.4% relative to 9.0)
 9.2  338 (-31.2% relative to 9.1)

 A second pass through this data noted that the maximum number of buffers
 cleaned by the background writer is =2785 in 9.0/9.1, while it goes as high
 as 17345 times in 9.2.  The background writer is so busy now it hits the
 max_clean limit around 147 times in the slower[1] of the 9.2 runs.  That's
 an average of once every 4 seconds, quite frequent.  Whereas max_clean
 rarely happens in the comparable 9.0/9.1 results.  This is starting to point
 my finger more toward this being an unintended consequence of the background
 writer/checkpointer split.

I guess the question that occurs to me is: why is it busier?

It may be that the changes we've made to reduce lock contention are
allowing foreground processes to get work done faster.  When they get
work done faster, they dirty more buffers, and therefore the
background writer gets busier.  Also, if the background writer is more
reliably cleaning pages even during checkpoints, that could have the
same effect.  Backends write fewer of their own pages, therefore they
get more real work done, which of course means dirtying more pages.
But I'm just speculating here.

 Thinking out loud, about solutions before the problem is even nailed down, I
 wonder if we should consider lowering bgwriter_lru_maxpages now in the
 default config?  In older versions, the page cleaning work had at most a 50%
 duty cycle; it was only running when checkpoints were not.

Is this really true?  I see CheckpointWriteDelay calling BgBufferSync
in 9.1.  Background writing would stop during the sync phase and
perhaps slow down a bit during checkpoint writing, but I don't think
it was stopped completely.

I'm curious what vmstat output looks like during your test.  I've
found that's a good way to know whether the system is being limited by
I/O, CPU, or locks.  It'd also be interesting to know what the %
utilization figures for the disks looked like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-18 Thread Jan Urbański
On 18/02/12 20:30, Tom Lane wrote:
 Dave Malcolm at Red Hat has been working on a static code analysis tool
 for Python-related C code.  He reports here on some preliminary results
 for plpython.c:
 https://bugzilla.redhat.com/show_bug.cgi?id=795011
 
 I'm not enough of a Python hacker to evaluate the significance of these
 issues, but somebody who does work on that code ought to take a look and
 see what we ought to patch.

Very cool!

Some of them look like legitimate bugs, some seem to stem from the fact
that the tool does not know that PLy_elog(ERROR) does not return. I
wonder if it could be taught that.

I'll try to fixes these while reworking the I/O functions memory leak
patch I sent earlier.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Simon Riggs (si...@2ndquadrant.com) wrote:
 Do we have volunteers that might save Tom from taking on this task?
 It's not something that requires too much knowledge and experience of
 PostgreSQL, so is an easier task for a newcomer.

 Sure, it doesn't require knowledge of PG, but I dare say there aren't
 very many newcomers who are going to walk in knowing how to manage
 complex regex code..  I haven't seen too many who can update gram.y,
 much less make our regex code handle Unicode better.  I'm all for
 getting other people to help with the code, of course, but I wouldn't
 hold my breath and leave existing bugs open on the hopes that someone's
 gonna show up.

Yeah ... if you *don't* know the difference between a DFA and an NFA,
you're likely to find yourself in over your head.  Having said that,
this is eminently learnable stuff and pretty self-contained, so somebody
who had the time and interest could make themselves into an expert in
a reasonable amount of time.  I'm not really eager to become the
project's regex guru, but only because I have ninety-nine other things
to do not because I don't find it interesting.  Right at the moment I'm
probably far enough up the learning curve that I can fix the backref
problem faster than anyone else, so I'm kind of inclined to go do that.
But I'd be entirely happy to let someone else become the lead hacker in
this area going forward.  What we can't do is just pretend that it
doesn't need attention.

In the long run I do wish that Spencer's code would become a standalone
package and have more users than just us and Tcl, but that is definitely
work I don't have time for now.  I think somebody would need to commit
significant amounts of time over multiple years to give it any real hope
of success.

One immediate consequence of deciding that we are lead maintainers and
not just consumers is that we should put in some regression tests,
instead of taking the attitude that the Tcl guys are in charge of that.
I have a head cold today and am not firing on enough cylinders to do
anything actually complicated, so I was thinking of spending the
afternoon transliterating the Tcl regex test cases into SQL as a
starting point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 7:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 14, 2012 at 3:25 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 02/14/2012 01:45 PM, Greg Smith wrote:

 scale=1000, db is 94% of RAM; clients=4
 Version TPS
 9.0  535
 9.1  491 (-8.4% relative to 9.0)
 9.2  338 (-31.2% relative to 9.1)

 A second pass through this data noted that the maximum number of buffers
 cleaned by the background writer is =2785 in 9.0/9.1, while it goes as high
 as 17345 times in 9.2.  The background writer is so busy now it hits the
 max_clean limit around 147 times in the slower[1] of the 9.2 runs.  That's
 an average of once every 4 seconds, quite frequent.  Whereas max_clean
 rarely happens in the comparable 9.0/9.1 results.  This is starting to point
 my finger more toward this being an unintended consequence of the background
 writer/checkpointer split.

 I guess the question that occurs to me is: why is it busier?

 It may be that the changes we've made to reduce lock contention are
 allowing foreground processes to get work done faster.  When they get
 work done faster, they dirty more buffers, and therefore the
 background writer gets busier.  Also, if the background writer is more
 reliably cleaning pages even during checkpoints, that could have the
 same effect.  Backends write fewer of their own pages, therefore they
 get more real work done, which of course means dirtying more pages.

The checkpointer/bgwriter split allows the bgwriter to do more work,
which is the desired outcome, not an unintended consequence.

The general increase in performance means there is more work to do. So
both things mean there is more bgwriter activity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 One immediate consequence of deciding that we are lead maintainers and
 not just consumers is that we should put in some regression tests,
 instead of taking the attitude that the Tcl guys are in charge of that.
 I have a head cold today and am not firing on enough cylinders to do
 anything actually complicated, so I was thinking of spending the
 afternoon transliterating the Tcl regex test cases into SQL as a
 starting point.

Having just had that brand of virus, I'd skip it and take the time
off, like I should have.

Translating the test cases is a great way in for a volunteer, so
please leave a few easy things to get people started on the road to
maintaining that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Vik Reykja
On Sat, Feb 18, 2012 at 21:04, Simon Riggs si...@2ndquadrant.com wrote:

 On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  One immediate consequence of deciding that we are lead maintainers and
  not just consumers is that we should put in some regression tests,
  instead of taking the attitude that the Tcl guys are in charge of that.
  I have a head cold today and am not firing on enough cylinders to do
  anything actually complicated, so I was thinking of spending the
  afternoon transliterating the Tcl regex test cases into SQL as a
  starting point.

 Having just had that brand of virus, I'd skip it and take the time
 off, like I should have.

 Translating the test cases is a great way in for a volunteer, so
 please leave a few easy things to get people started on the road to
 maintaining that.


I would be willing to have a go at translating test cases.  I do not (yet)
have the C knowledge to maintain the regex code, though.


Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-18 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 On 18/02/12 20:30, Tom Lane wrote:
 Dave Malcolm at Red Hat has been working on a static code analysis tool
 for Python-related C code.  He reports here on some preliminary results
 for plpython.c:
 https://bugzilla.redhat.com/show_bug.cgi?id=795011

 I'll try to fixes these while reworking the I/O functions memory leak
 patch I sent earlier.

If you find any live bugs, it'd likely be better to deal with them as
a separate patch so that we can back-patch ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Andrew Dunstan



On 02/18/2012 02:25 PM, Stephen Frost wrote:

Do we have volunteers that might save Tom from taking on this task?
It's not something that requires too much knowledge and experience of
PostgreSQL, so is an easier task for a newcomer.

Sure, it doesn't require knowledge of PG, but I dare say there aren't
very many newcomers who are going to walk in knowing how to manage
complex regex code..  I haven't seen too many who can update gram.y,
much less make our regex code handle Unicode better.  I'm all for
getting other people to help with the code, of course, but I wouldn't
hold my breath and leave existing bugs open on the hopes that someone's
gonna show up.



Indeed, the number of people in the community who can hit the ground 
running with this is probably vanishingly small, sadly. (I haven't 
touched any formal DFA/NFA code in a couple of decades.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c

2012-02-18 Thread Jan Urbański
On 18/02/12 21:17, Tom Lane wrote:
 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 On 18/02/12 20:30, Tom Lane wrote:
 Dave Malcolm at Red Hat has been working on a static code analysis tool
 for Python-related C code.  He reports here on some preliminary results
 for plpython.c:
 https://bugzilla.redhat.com/show_bug.cgi?id=795011
 
 I'll try to fixes these while reworking the I/O functions memory leak
 patch I sent earlier.
 
 If you find any live bugs, it'd likely be better to deal with them as
 a separate patch so that we can back-patch ...

Sure, I meant to say I'll look at these as well, but will make them into
a separate patch.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Vik Reykja vikrey...@gmail.com writes:
 On Sat, Feb 18, 2012 at 21:04, Simon Riggs si...@2ndquadrant.com wrote:
 Translating the test cases is a great way in for a volunteer, so
 please leave a few easy things to get people started on the road to
 maintaining that.

 I would be willing to have a go at translating test cases.  I do not (yet)
 have the C knowledge to maintain the regex code, though.

Sure, have at it.  I was thinking that we should make a new regex.sql
test file for any cases that are locale-independent.  If they have any
that are dependent on recognizing non-ASCII characters as letters,
we could perhaps drop those into collate.linux.utf8.sql --- note that
we might need my draft patch from yesterday before anything outside the
LATIN1 character set would pass.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Rob Wultsch
On Fri, Feb 17, 2012 at 4:12 PM, Josh Berkus j...@agliodbs.com wrote:
 On 2/17/12 12:04 PM, Robert Haas wrote:
 The argument isn't about whether the user made the right design
 choices; it's about whether he should be forced to insert an explicit
 type cast to get the query to do what it is unambiguously intended to
 do.

 I don't find INTEGER LIKE '1%' to be unambiguous.

 Prior to this discussion, if I had run across such a piece of code, I
 couldn't have told you what it would do in MySQL without testing.

 What *does* it do in MySQL?


IIRC it casts each INTEGER (without any left padding) to text and then
does the comparison as per normal. Comparison of dissimilar types are
a recipe for full table scans and unexpected results.  A really good
example is
select * from employees where first_name=5;
vs
select * from employees where first_name='5';

Where first_name is string the queries above have very different
behaviour in MySQL. The first does a full table scan and coerces
first_name to an integer (so '5adfs' - 5) while the second can use an
index as it is normal string comparison. I have seen this sort of
things cause significant production issues several times.*

I have seen several companies use comparisons of dissimilar data types
as part of their stump the prospective DBA test and they stump lots of
folks.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-18 Thread Marko Kreen
Demos/tests of the new API:

  https://github.com/markokr/libpq-rowproc-demos

Comments resulting from that:

- PQsetRowProcessorErrMsg() should take const char*

- callback API should be (void *, PGresult *, PQrowValue*)
  or void* at the end, but not in the middle

I have not looked yet what needs to be done to get
ErrMsg callable outside of callback, if it requires PGconn,
then we should add PGconn also to callback args.

 On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:
  I added the function PQskipRemainingResult() and use it in
 dblink. This reduces the number of executing try-catch block from
 the number of rows to one per query in dblink.

I still think we don't need extra skipping function.

Yes, the callback function needs have a flag to know that
rows need to be skip, but for such low-level API it does
not seem to be that hard requirement.

If this really needs to be made easier then getRowProcessor
might be better approach, to allow easy implementation
of generic skipping func for user.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-18 Thread Kohei KaiGai
2012年2月17日6:08 Shigeru Hanada shigeru.han...@gmail.com:
 (2012/02/17 2:02), Kohei KaiGai wrote:
 I found a strange behavior with v10. Is it available to reproduce?
 snip
 I tried to raise an error on remote side.

postgres=# select * FROM ftbl WHERE 100 / (a - 3)  0;
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!  \q

 I could reproduce the error by omitting CFLAGS=-O0 from configure
 option.  I usually this for coding environment so that gdb debugging
 works correctly, so I haven't noticed this issue.  I should test
 optimized environment too...

 Expected result in that case is:

 postgres=# select * from pgbench_accounts where 100 / (aid - 3)  0;
 ERROR:  could not fetch rows from foreign server
 DETAIL:  ERROR:  division by zero

 HINT:  FETCH 1 FROM pgsql_fdw_cursor_0
 postgres=#

 This is the PG_CATCH block at execute_query(). fetch_result() raises
 an error, then it shall be catched to release PGresult.
 Although res should be NULL at this point, PQclear was called with
 a non-zero value according to the call trace.

 More strangely, I tried to inject elog(INFO, ...) to show the value of res
 at this point. Then, it become unavailable to reproduce when I tried to
 show the pointer of res with elog(INFO, res = %p,res);

 Why the res has a non-zero value, even though it was cleared prior
 to fetch_result() and an error was raised within this function?

 I've found the the problem is uninitialized PGresult variables.
 Uninitialized PGresult pointer is used in some places, so its value is
 garbage in PG_CATCH block when assignment code has been interrupted by
 longjmp.

 Probably recommended style would be like this:

 pseudo_code
PGresult *res = NULL;/* must be NULL in PG_CATCH */

PG_TRY();
{
res = func_might_throw_exception();
if (PQstatus(res) != PGRES_xxx_OK)
{
/* error handling, pass message to caller */
ereport(ERROR, ...);
}

/* success case, use result of query and release it */
...
PQclear(res);
}
PG_CATCH();
{
PQclear(res);
PG_RE_THROW();
/* caller should catch this exception. */
}
 /pseudo_code

 I misunderstood that PGresult pointer always has valid value after that
 line, because I had wrote assignment of PGresult pointer before PG_TRY
 block.  Fixes for this issue are:

 (1) Initialize PGresult pointer with NULL, if it is used in PG_CATCH.
 (2) Move PGresult assignment into PG_TRY block so that we can get
 compiler warning of uninitialized  variable, just in case.

 Please find attached a patch including fixes for this issue.

I marked this patch as Ready for Committer, since I have nothing to
comment any more.

I'd like committer help to review this patch and it get merged.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Don Baccus

On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:
 
 Where first_name is string the queries above have very different
 behaviour in MySQL. The first does a full table scan and coerces
 first_name to an integer (so '5adfs' - 5) 

Oh my, I can't wait to see someone rise to the defense of *this* behavior!


Don Baccus
http://donb.photo.net
http://birdnotes.net
http://openacs.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Christopher Browne
On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus dhog...@pacifier.com wrote:

 On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:

 Where first_name is string the queries above have very different
 behaviour in MySQL. The first does a full table scan and coerces
 first_name to an integer (so '5adfs' - 5)

 Oh my, I can't wait to see someone rise to the defense of *this* behavior!

I can see a use, albeit a clumsy one, to the notion of looking for values
   WHERE integer_id_column like '1%'

It's entirely common for companies to organize general ledger account
numbers by having numeric prefixes that are somewhat meaningful.

A hierarchy like the following is perfectly logical:
 -  to 0999 :: Cash accounts [1]
 - 1000 to 1999 :: Short Term Assets
 - 2000 to 2999 :: Long Term Assets
 - 3000 to 3999 :: Incomes
 - 4000 to 4999 :: Costs of Goods Sold
 - 5000 to 5999 :: Other Expenses
 - 6000 to 6999 :: Share Capital
 - 7000 to 7999 :: Retained Earnings and such

And back in the pre-computer days, accountants got very comfortable
with the shorthands that, for instance, Income is in the 3000
series.

We are much smarter today (well, not necessarily!) and can use other
ways to indicate hierarchy, so that there's no reason to *care* what
that account number is.

But if old-school accountants that think 3000 series *demand* that,
and as they're likely senior enough to assert their way, they're
likely to succeed in that demand, then it's pretty easy to this to
lead to somewhat clumsy account_id like '3%' as a search for income.

If I put my purist hat on, then the *right* answer is a range query, thus
  WHERE account_id between 3000 and 3999

The new RANGE stuff that Jeff Davis has been adding into 9.2 should,
in principle, be the even better way to represent this kind of thing.

I'd think it nearly insane if someone was expecting '3%' to match not
only the '3000 thru 3999' series, but also '300 to 399' and 30 to 39
and 3.  A situation where that is the right set of results requires
a mighty strangely designed numbering system.   I imagine a designer
would want to rule out the range 0-999, in such a design.

Nonetheless, the need for where account_id like '1%' comes from a
system designed with the above kind of thinking about account numbers,
and that approach fits mighty well with the way people thought back
when a computer was a person whose job it was to work out sums.

Notes:
[1]  A careful observer will notice that the prefix notion doesn't
work for the first range without forcing leading zeroes onto
numbers...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Don Baccus

On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote:

 On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus dhog...@pacifier.com wrote:
 
 On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:
 
 Where first_name is string the queries above have very different
 behaviour in MySQL. The first does a full table scan and coerces
 first_name to an integer (so '5adfs' - 5)
 
 Oh my, I can't wait to see someone rise to the defense of *this* behavior!
 
 I can see a use, albeit a clumsy one, to the notion of looking for values
   WHERE integer_id_column like '1%'
 
 It's entirely common for companies to organize general ledger account
 numbers by having numeric prefixes that are somewhat meaningful.
 
 A hierarchy like the following is perfectly logical:
 -  to 0999 :: Cash accounts [1]

I asked earlier if anyone would expect 01 like '0%' to match …

Apparently so!

Your example is actually a good argument for storing account ids as text, 
because '' like '0%' *will* match.

I'd think it nearly insane if someone was expecting '3%' to match not
only the '3000 thru 3999' series, but also '300 to 399' and 30 to 39
and 3.

How is PG supposed to know that integers compared to strings are always to be 
padded out to precisely 4 digits?


Don Baccus
http://donb.photo.net
http://birdnotes.net
http://openacs.org







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Dimitri Fontaine
Don Baccus dhog...@pacifier.com writes:
 A hierarchy like the following is perfectly logical:
 -  to 0999 :: Cash accounts [1]

 Your example is actually a good argument for storing account ids as
 text, because '' like '0%' *will* match.

FWIW, I too think that if you want to process your integers as text for
some operations (LIKE) and as integer for some others, you'd better do
the casting explicitly.

In the worked-out example Christopher has been proposing, just alter the
column type to text and be done, I can't see summing up or whatever int
arithmetic usage being done on those general ledger account numbers. Use
a domain (well a CHECK constraint really) to tight things down.

As for lpad(), that's a function working on text that returns text, so
having a variant that accepts integers would not be confusing.  Then
again, why aren't you using to_char() if processing integers?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

PS: having worked on telephone number prefix indexing and processing
them as text, I might have a biased opinion.  You don't add up phone
numbers, though, do you?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Yeah, it's conceivable that we could implement something whereby
 characters with codes above some cutoff point are handled via runtime
 calls to iswalpha() and friends, rather than being included in the
 statically-constructed DFA maps.  The cutoff point could likely be a lot
 less than U+, too, thereby saving storage and map build time all
 round.

It's been proposed to build a “regexp” type in PostgreSQL which would
store the DFA directly and provides some way to run that DFA out of its
“storage” without recompiling.

Would such a mechanism be useful here?  Would it be useful only when
storing the regexp in a column somewhere then applying it in the query
from there (so most probably adding a join or subquery somewhere)?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Andrew Dunstan



On 02/18/2012 05:34 PM, Don Baccus wrote:

On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote:


On Sat, Feb 18, 2012 at 4:12 PM, Don Baccusdhog...@pacifier.com  wrote:

On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:

Where first_name is string the queries above have very different
behaviour in MySQL. The first does a full table scan and coerces
first_name to an integer (so '5adfs' -  5)

Oh my, I can't wait to see someone rise to the defense of *this* behavior!

I can see a use, albeit a clumsy one, to the notion of looking for values
   WHERE integer_id_column like '1%'

It's entirely common for companies to organize general ledger account
numbers by having numeric prefixes that are somewhat meaningful.

A hierarchy like the following is perfectly logical:
-  to 0999 :: Cash accounts [1]

I asked earlier if anyone would expect 01 like '0%' to match …

Apparently so!

Your example is actually a good argument for storing account ids as text, 
because '' like '0%' *will* match.

I'd think it nearly insane if someone was expecting '3%' to match not
only the '3000 thru 3999' series, but also '300 to 399' and 30 to 39
and 3.

How is PG supposed to know that integers compared to strings are always to be 
padded out to precisely 4 digits?




By this point the Lone Ranger has committed suicide.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Yeah ... if you *don't* know the difference between a DFA and an NFA,
 you're likely to find yourself in over your head.  Having said that,

So, here's a paper I found very nice to get started into this subject:

  http://swtch.com/~rsc/regexp/regexp1.html

If anyone's interested into becoming our PostgreSQL regexp hero and
still needs a good kicker, I would recommend starting here :)

I see this paper mention the regexp code from Plan9, which supports both
UTF8 and other muti-byte encodings, and is released as a library under
the MIT licence:

  http://swtch.com/plan9port/unix/

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Yeah, it's conceivable that we could implement something whereby
 characters with codes above some cutoff point are handled via runtime
 calls to iswalpha() and friends, rather than being included in the
 statically-constructed DFA maps.  The cutoff point could likely be a lot
 less than U+, too, thereby saving storage and map build time all
 round.

 It's been proposed to build a “regexp” type in PostgreSQL which would
 store the DFA directly and provides some way to run that DFA out of its
 “storage” without recompiling.

 Would such a mechanism be useful here?

No, this is about what goes into the DFA representation in the first
place, not about how we store it and reuse it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Yeah ... if you *don't* know the difference between a DFA and an NFA,
 you're likely to find yourself in over your head.  Having said that,

 So, here's a paper I found very nice to get started into this subject:
   http://swtch.com/~rsc/regexp/regexp1.html

Yeah, I just found that this afternoon myself; it's a great intro.

If you follow the whole sequence of papers (there are 4) you'll find out
that this guy built a new regexp engine for Google, and these papers are
basically introducing/defending its design.  It turns out they've
released it under a BSD-ish license, so for about half a minute I was
thinking there might be a new contender for something we could adopt.
But there turn out to be at least two killer reasons why we won't:
* it's in C++ not C
* it doesn't support backrefs, as well as a few other features that
  maybe aren't as interesting but still would represent compatibility
  gotchas if they went away.
Too bad.  But the papers are well worth reading.  One thing I took away
from them is that it's possible to do capturing parens, though not
backrefs, without back-tracking.  Spencer's code treats both of those
features as messy (ie, slow, because they force use of the NFA-style
backtracking search code).  So there might be reason to reimplement
the parens-but-no-backrefs case using some ideas from these papers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Marko Kreen
On Sun, Feb 19, 2012 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Yeah ... if you *don't* know the difference between a DFA and an NFA,
 you're likely to find yourself in over your head.  Having said that,

 So, here's a paper I found very nice to get started into this subject:
   http://swtch.com/~rsc/regexp/regexp1.html

 Yeah, I just found that this afternoon myself; it's a great intro.

 If you follow the whole sequence of papers (there are 4) you'll find out
 that this guy built a new regexp engine for Google, and these papers are
 basically introducing/defending its design.  It turns out they've
 released it under a BSD-ish license, so for about half a minute I was
 thinking there might be a new contender for something we could adopt.
 But there turn out to be at least two killer reasons why we won't:
 * it's in C++ not C
 * it doesn't support backrefs, as well as a few other features that
  maybe aren't as interesting but still would represent compatibility
  gotchas if they went away.

Another interesting library, technology-wise, is libtre:

  http://laurikari.net/tre/about/
  http://laurikari.net/tre/documentation/

NetBSD plans to replace the libc regex with it:

  http://netbsd-soc.sourceforge.net/projects/widechar-regex/
  
http://groups.google.com/group/muc.lists.netbsd.current-users/browse_thread/thread/db5628e2e8f810e5/a99c368a6d22b6f8?lnk=gstq=libtre#a99c368a6d22b6f8

Another useful project - ATT regex tests:

  http://www2.research.att.com/~gsf/testregex/


About our Spencer code - if we don't have resources (not called Tom)
to clean it up and make available as library (in short term - at least
to TCL folks) we should drop it.  Because it means it's dead end,
however good it is.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Tom Lane
I wrote:
 And here's a poorly-tested draft patch for that.

I've done some more testing now, and am satisfied that this works as
intended.  However, some crude performance testing suggests that people
might be annoyed with it.  As an example, in 9.1 with pl_PL.utf8 locale,
I see this:
select 'aa' ~ '\w\w\w\w\w\w\w\w\w\w\w';
taking perhaps 0.75 ms on first execution and 0.4 ms on subsequent
executions, the difference being the time needed to compile and cache
the DFA representation of the regexp.  With the patch, the numbers are
more like 5 ms and 0.4 ms, meaning the compilation time has gone up by
something near a factor of 10, though AFAICT execution time hasn't
moved.  It's hard to tell how significant that would be to real-world
queries, but in the worst case where our caching of regexps doesn't help
much, it could be disastrous.

All of the extra time is in manipulation of the much larger number of
DFA arcs required to represent all the additional character codes that
are being considered to be letters.

Perhaps I'm being overly ASCII-centric, but I'm afraid to commit this
as-is; I think the number of people who are hurt by the performance
degradation will be greatly larger than the number who are glad because
characters in $random_alphabet are now seen to be letters.  I think an
actually workable solution will require something like what I speculated
about earlier:

 Yeah, it's conceivable that we could implement something whereby
 characters with codes above some cutoff point are handled via runtime
 calls to iswalpha() and friends, rather than being included in the
 statically-constructed DFA maps.  The cutoff point could likely be a lot
 less than U+, too, thereby saving storage and map build time all
 round.

In the meantime, I still think the caching logic is worth having, and
we could at least make some people happy if we selected a cutoff point
somewhere between U+FF and U+.  I don't have any strong ideas about
what a good compromise cutoff would be.  One possibility is U+7FF, which
corresponds to the limit of what fits in 2-byte UTF8; but I don't know
if that corresponds to any significant dropoff in frequency of usage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Christopher Browne
On Sat, Feb 18, 2012 at 5:34 PM, Don Baccus dhog...@pacifier.com wrote:

 On Feb 18, 2012, at 1:43 PM, Christopher Browne wrote:
 A hierarchy like the following is perfectly logical:
 -  to 0999 :: Cash accounts [1]

 I asked earlier if anyone would expect 01 like '0%' to match …

 Apparently so!

Yes, and I was intentionally treating this as an oddity.

 Your example is actually a good argument for storing account ids as text, 
 because '' like '0%' *will* match.

Absolutely.

The trouble is that if you use the term account NUMBER enough times,
some portion of people will think that it's a number in the sense that
it should be meaningful to add and subtract against them.

 I'd think it nearly insane if someone was expecting '3%' to match not
 only the '3000 thru 3999' series, but also '300 to 399' and 30 to 39
 and 3.

 How is PG supposed to know that integers compared to strings are always to be 
 padded out to precisely 4 digits?

I think it's not quite right to treat it as how is PG supposed to
know.  The problem is a bit more abstract; it occurs without having a
database involved.

The notion that the ranges (3), (30-39), (300-399), and (3000-3999)
ought to be considered connected together in the account number
classification is what seems crazy to me.  But that's what account
number starts with a 3 could be expected to imply.

At any rate, yes, this is liable to point the Lone Ranger towards
solutions that involve him not riding off into the sunset!
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Christopher Browne
On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen mark...@gmail.com wrote:
 About our Spencer code - if we don't have resources (not called Tom)

Is there anything that would be worth talking about directly with
Henry?  He's in one of my circles of colleagues; had dinner with a
group that included him on Thursday.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Christopher Browne cbbro...@gmail.com writes:
 On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen mark...@gmail.com wrote:
 About our Spencer code - if we don't have resources (not called Tom)

 Is there anything that would be worth talking about directly with
 Henry?  He's in one of my circles of colleagues; had dinner with a
 group that included him on Thursday.

Really!?  I had about come to the conclusion he was dead, because he's
sure been damn invisible as far as I could find.  Is he still interested
in what happens with his regex code, or willing to answer questions
about it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Brendan Jurd
On 19 February 2012 06:52, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah ... if you *don't* know the difference between a DFA and an NFA,
 you're likely to find yourself in over your head.  Having said that,
 this is eminently learnable stuff and pretty self-contained, so somebody
 who had the time and interest could make themselves into an expert in
 a reasonable amount of time.

I find myself in possession of both time and interest.  I have to
admit up-front that I don't have experience with regex code, but I do
have some experience with parsers generally, and I'd like to think
some of that skillset would transfer to this problem.  I also find
regexes fascinating and extremely useful, so learning more about them
will be no hardship.

I'd happily cede to an expert, should one appear, but otherwise I'm
all for moving the regex code into a discrete library, and I'm
volunteering to take a swing at it.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, it's conceivable that we could implement something whereby
 characters with codes above some cutoff point are handled via runtime
 calls to iswalpha() and friends, rather than being included in the
 statically-constructed DFA maps.  The cutoff point could likely be a lot
 less than U+, too, thereby saving storage and map build time all
 round.

 In the meantime, I still think the caching logic is worth having, and
 we could at least make some people happy if we selected a cutoff point
 somewhere between U+FF and U+.  I don't have any strong ideas about
 what a good compromise cutoff would be.  One possibility is U+7FF, which
 corresponds to the limit of what fits in 2-byte UTF8; but I don't know
 if that corresponds to any significant dropoff in frequency of usage.

The problem, of course, is that this probably depends quite a bit on
what language you happen to be using.  For some languages, it won't
matter whether you cut it off at U+FF or U+7FF; while for others even
U+ might not be enough.  So I think this is one of those cases
where it's somewhat meaningless to talk about frequency of usage.

In theory you can imagine a regular expression engine where these
decisions can be postponed until we see the string we're matching
against.  IOW, your DFA ends up with state transitions for characters
specifically named, plus a state transition for anything else that's
a letter, plus a state transition for anything else not otherwise
specified.  Then you only need to test the letters that actually
appear in the target string, rather than all of the ones that might
appear there.

But implementing that could be quite a lot of work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Vik Reykja
On Sun, Feb 19, 2012 at 04:33, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yeah, it's conceivable that we could implement something whereby
  characters with codes above some cutoff point are handled via runtime
  calls to iswalpha() and friends, rather than being included in the
  statically-constructed DFA maps.  The cutoff point could likely be a lot
  less than U+, too, thereby saving storage and map build time all
  round.
 
  In the meantime, I still think the caching logic is worth having, and
  we could at least make some people happy if we selected a cutoff point
  somewhere between U+FF and U+.  I don't have any strong ideas about
  what a good compromise cutoff would be.  One possibility is U+7FF, which
  corresponds to the limit of what fits in 2-byte UTF8; but I don't know
  if that corresponds to any significant dropoff in frequency of usage.

 The problem, of course, is that this probably depends quite a bit on
 what language you happen to be using.  For some languages, it won't
 matter whether you cut it off at U+FF or U+7FF; while for others even
 U+ might not be enough.  So I think this is one of those cases
 where it's somewhat meaningless to talk about frequency of usage.


Does it make sense for regexps to have collations?


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
 Does it make sense for regexps to have collations?

As I understand it, collations determine the sort-ordering of strings.
 Regular expressions don't care about that.  Why do you ask?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus dhog...@pacifier.com wrote:
 On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:

 Where first_name is string the queries above have very different
 behaviour in MySQL. The first does a full table scan and coerces
 first_name to an integer (so '5adfs' - 5)

 Oh my, I can't wait to see someone rise to the defense of *this* behavior!

Well, this gets to my point.  The behavior Rob is mentioning here is
the one that caused us to make the implicit casting changes in the
first place.  And, in this situation, I agree that throwing an error
is much better than silently doing something that may be quite
different from what the user expects.

However, the fact that the implicit casting changes are an improvement
in this case does not mean that they are an improvement in every case.
 All I am asking for here is that we examine the various cases on
their merits rather than assuming that our way must be better than
MySQL's way, or visca versa.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-02-18 Thread Robert Haas
On Fri, Feb 17, 2012 at 7:12 PM, Josh Berkus j...@agliodbs.com wrote:
 On 2/17/12 12:04 PM, Robert Haas wrote:
 The argument isn't about whether the user made the right design
 choices; it's about whether he should be forced to insert an explicit
 type cast to get the query to do what it is unambiguously intended to
 do.

 I don't find INTEGER LIKE '1%' to be unambiguous.

Please propose two reasonable interpretations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Vik Reykja
On Sun, Feb 19, 2012 at 05:03, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
  Does it make sense for regexps to have collations?

 As I understand it, collations determine the sort-ordering of strings.
  Regular expressions don't care about that.  Why do you ask?


Perhaps I used the wrong term, but I was thinking the locale could tell us
what alphabet we're dealing with. So a regexp using en_US would give
different word-boundary results from one using zh_CN.


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 In theory you can imagine a regular expression engine where these
 decisions can be postponed until we see the string we're matching
 against.  IOW, your DFA ends up with state transitions for characters
 specifically named, plus a state transition for anything else that's
 a letter, plus a state transition for anything else not otherwise
 specified.  Then you only need to test the letters that actually
 appear in the target string, rather than all of the ones that might
 appear there.

 But implementing that could be quite a lot of work.

Yeah, not to mention slow.  The difficulty is overlapping sets of
characters.  As a simple example, if your regex refers to 3, 7,
[[:digit:]], X, and [[:alnum:]], then you end up needing five distinct
colors: 3, 7, X, all digits that aren't 3 or 7, all alphanumerics
that aren't any of the preceding.  And state transitions for the digit
and alnum cases had better mention all and only the correct colors.
I've been tracing through the logic this evening, and it works pretty
simply given that all named character classes are immediately expanded
out to their component characters.  If we are going to try to keep
the classes in some kind of symbolic form, it's a lot messier.  In
particular, I think your sketch above would lead to having to test
every character against iswdigit and iswalnum at runtime, which would
be disastrous performancewise.  I'd like to at least avoid that for the
shorter (and presumably more common) UTF8 codes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initial 9.2 pgbench write results

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 3:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, Feb 18, 2012 at 7:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 14, 2012 at 3:25 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 02/14/2012 01:45 PM, Greg Smith wrote:

 scale=1000, db is 94% of RAM; clients=4
 Version TPS
 9.0  535
 9.1  491 (-8.4% relative to 9.0)
 9.2  338 (-31.2% relative to 9.1)

 A second pass through this data noted that the maximum number of buffers
 cleaned by the background writer is =2785 in 9.0/9.1, while it goes as high
 as 17345 times in 9.2.  The background writer is so busy now it hits the
 max_clean limit around 147 times in the slower[1] of the 9.2 runs.  That's
 an average of once every 4 seconds, quite frequent.  Whereas max_clean
 rarely happens in the comparable 9.0/9.1 results.  This is starting to point
 my finger more toward this being an unintended consequence of the background
 writer/checkpointer split.

 I guess the question that occurs to me is: why is it busier?

 It may be that the changes we've made to reduce lock contention are
 allowing foreground processes to get work done faster.  When they get
 work done faster, they dirty more buffers, and therefore the
 background writer gets busier.  Also, if the background writer is more
 reliably cleaning pages even during checkpoints, that could have the
 same effect.  Backends write fewer of their own pages, therefore they
 get more real work done, which of course means dirtying more pages.

 The checkpointer/bgwriter split allows the bgwriter to do more work,
 which is the desired outcome, not an unintended consequence.

 The general increase in performance means there is more work to do. So
 both things mean there is more bgwriter activity.

I think you're saying pretty much the same thing I was saying, so I agree.

Here's what's bugging me.  Greg seemed to be assuming that the
business of the background writer might be the cause of the
performance drop-off he measured on certain test cases.  But you and I
both seem to feel that the business of the background writer is
intentional and desirable.  Supposing we're right, where's the
drop-off coming from?  *scratches head*

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Tom Lane
Vik Reykja vikrey...@gmail.com writes:
 On Sun, Feb 19, 2012 at 05:03, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
 Does it make sense for regexps to have collations?

 As I understand it, collations determine the sort-ordering of strings.
 Regular expressions don't care about that.  Why do you ask?

 Perhaps I used the wrong term, but I was thinking the locale could tell us
 what alphabet we're dealing with. So a regexp using en_US would give
 different word-boundary results from one using zh_CN.

Our interpretation of a collation is that it sets both LC_COLLATE and
LC_CTYPE.  Regexps may not care about the first but they definitely care
about the second.  This is why the stuff in regc_pg_locale.c pays
attention to collation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes:
 On 19 February 2012 06:52, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah ... if you *don't* know the difference between a DFA and an NFA,
 you're likely to find yourself in over your head.  Having said that,
 this is eminently learnable stuff and pretty self-contained, so somebody
 who had the time and interest could make themselves into an expert in
 a reasonable amount of time.

 I find myself in possession of both time and interest.  I have to
 admit up-front that I don't have experience with regex code, but I do
 have some experience with parsers generally, and I'd like to think
 some of that skillset would transfer to this problem.  I also find
 regexes fascinating and extremely useful, so learning more about them
 will be no hardship.

 I'd happily cede to an expert, should one appear, but otherwise I'm
 all for moving the regex code into a discrete library, and I'm
 volunteering to take a swing at it.

That sounds great.

BTW, if you don't have it already, I'd highly recommend getting a copy
of Friedl's Mastering Regular Expressions.  It's aimed at users not
implementers, but there is a wealth of valuable context information in
there, as well as a really good not-too-technical overview of typical
implementation techniques for RE engines.  You'd probably still want one
of the more academic presentations such as the dragon book for
reference, but I think Freidl's take on it is extremely useful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 11:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 In theory you can imagine a regular expression engine where these
 decisions can be postponed until we see the string we're matching
 against.  IOW, your DFA ends up with state transitions for characters
 specifically named, plus a state transition for anything else that's
 a letter, plus a state transition for anything else not otherwise
 specified.  Then you only need to test the letters that actually
 appear in the target string, rather than all of the ones that might
 appear there.

 But implementing that could be quite a lot of work.

 Yeah, not to mention slow.  The difficulty is overlapping sets of
 characters.  As a simple example, if your regex refers to 3, 7,
 [[:digit:]], X, and [[:alnum:]], then you end up needing five distinct
 colors: 3, 7, X, all digits that aren't 3 or 7, all alphanumerics
 that aren't any of the preceding.  And state transitions for the digit
 and alnum cases had better mention all and only the correct colors.

Yeah, that's unfortunate.  On the other hand, if you don't use colors
for this case, aren't you going to need, for each DFA state, a
gigantic lookup table that includes every character in the server
encoding?  Even if you've got plenty of memory, initializing such a
beast seems awfully expensive, and it might not do very good things
for cache locality, either.

 I've been tracing through the logic this evening, and it works pretty
 simply given that all named character classes are immediately expanded
 out to their component characters.  If we are going to try to keep
 the classes in some kind of symbolic form, it's a lot messier.  In
 particular, I think your sketch above would lead to having to test
 every character against iswdigit and iswalnum at runtime, which would
 be disastrous performancewise.  I'd like to at least avoid that for the
 shorter (and presumably more common) UTF8 codes.

Hmm, but you could cache that information.  Instead of building a
cache that covers every possible character that might appear in the
target string, you can just cache the results for the code points that
you actually see.

Yet another option would be to dictate that the cache can't holes - it
will always include information for every code point from 0 up to some
value X.  If we see a code point in the target string which is greater
than X, then we extend the cache out as far as that code point.  That
way, people who are using only code points out to U+FF (or even U+7F)
don't pay the cost of building a large cache, but people who need it
can get correct behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers

2012-02-18 Thread Robert Haas
Just for kicks, I ran two 30-minute pgbench tests at scale factor 300
tonight on Nate Boley's machine, with -n -l -c 32 -j 32.  The
configurations were identical, except that on one of them, I set
wal_buffers=64MB.  It seemed to make quite a lot of difference:

wal_buffers not set (thus, 16MB):
tps = 3162.594605 (including connections establishing)

wal_buffers=64MB:
tps = 6164.194625 (including connections establishing)

Rest of config: shared_buffers = 8GB, maintenance_work_mem = 1GB,
synchronous_commit = off, checkpoint_segments = 300,
checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
wal_writer_delay = 20ms

I have attached tps scatterplots.  The obvious conclusion appears to
be that, with only 16MB of wal_buffers, the buffer wraps around with
some regularity: we can't insert more WAL because the buffer we need
to use still contains WAL that hasn't yet been fsync'd, leading to
long stalls.  More buffer space ameliorates the problem.  This is not
very surprising, when you think about it: it's clear that the peak tps
rate approaches 18k/s on these tests; right after a checkpoint, every
update will force a full page write - that is, a WAL record  8kB.  So
we'll fill up a 16MB WAL segment in about a tenth of a second.  That
doesn't leave much breathing room.  I think we might want to consider
adjusting our auto-tuning formula for wal_buffers to allow for a
higher cap, although this is obviously not enough data to draw any
firm conclusions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: tps-master-64mb.pngattachment: tps-master.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore ignores PGDATABASE

2012-02-18 Thread Robert Haas
On Sat, Feb 18, 2012 at 11:58 AM, Erik Rijkers e...@xs4all.nl wrote:
 pg_restore ignores environment variable PGDATABASE.

What exactly do you mean by ignores?  pg_restore prints results to
standard output unless a database name is specified.  AFAIK, there's
no syntax to say I want a direct-to-database restore to whatever you
think the default database is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore ignores PGDATABASE

2012-02-18 Thread Erik Rijkers
On Sun, February 19, 2012 06:27, Robert Haas wrote:
 On Sat, Feb 18, 2012 at 11:58 AM, Erik Rijkers e...@xs4all.nl wrote:
 pg_restore ignores environment variable PGDATABASE.

 What exactly do you mean by ignores?  pg_restore prints results to
 standard output unless a database name is specified.  AFAIK, there's
 no syntax to say I want a direct-to-database restore to whatever you
 think the default database is.

That's right, and that seems contradictory with:

This utility [pg_restore], like most other PostgreSQL utilities, also uses the 
environment
variables supported by libpq (see Section 31.13).

as pg_restore does 'ignore' (for want of a better word) PGDATABASE.

But I think I can conclude from your reply that that behaviour is indeed 
intentional.


thanks,

Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers