Re: [HACKERS] tsearch2 patch status report

2007-08-22 Thread Tom Lane
Chad Wagner [EMAIL PROTECTED] writes:
 Just a heads up, not sure if you guys are aware of it.  But one of the
 Makefile's (src/backend/tsearch/Makefile) added by this patch breaks the
 build out of source tree feature of autoconf/automake.  The problem is
 pretty straightforward, and after adding $(srcdir) everything seems to be
 fine.

Applied, thanks.  (Hm, I thought we had some buildfarm machines testing
VPATH builds these days?  Guess not ...)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] HOT patch, missing things

2007-08-22 Thread Pavan Deolasee
On 8/14/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Tom Lane [EMAIL PROTECTED] writes:

 
  Doesn't this design completely fail to take index bloat into account?
  Repairing heap fragmentation does not reduce the need for VACUUM to work
  on the indexes.

 Index bloat is a bit of an open issue already. Because page splits already
 prune any LP_DELETEd pointers any busy index keys will be pruned already.

However any index keys which have not been the subject of an index lookup --
 and that includes keys which are only accessed by bitmap-index-scans --
 won't
 be pruned.

 So we don't really know how much bloat is currently in an index. Perhaps
 we
 need a new statistic which gets updated whenever a page split prunes
 LP_DELETEd pointers (or perhaps when LP_DELETE is set?).




I agree here. As a first step, may be can address the heap space usage
statistics and then take up index stats separately. Index bloat would carry
a different weight in triggering autovacuum.

I shall code up a patch which tracks the dead space in the heap and
trigger autovac based on that.


Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] A couple of tsearch loose ends

2007-08-22 Thread Dimitri Fontaine
Hi list,

Le mardi 21 août 2007, Tom Lane a écrit :
 CREATE TEXT SEARCH DICTIONARY swedish (
 TEMPLATE = snowball,
 LANGUAGE = swedish,
 STOPWORDS = swedish
 );

 ALTER TEXT SEARCH DICTIONARY swedish (
 STOPWORDS
 );

 this dictionary would have LANGUAGE = swedish and no stopwords option.

 Any objections to changing it like that?

I don't understand why this ALTER variation is so different from existing 
ones, but maybe the following syntax can't work:
  ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;

For dropping an option, could one of those commands do?
  ALTER TEXT SEARCH DICTIONARY swedish DROP STOPWORDS;
  ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET NULL;

Not sure if it's doable or if it really looks more like other ALTER commands, 
but I think I'd like it more this way :)

Hope this helps,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] tsearch2 patch status report

2007-08-22 Thread Cédric Villemain
Le mardi 21 août 2007, Oleg Bartunov a écrit :
 On Mon, 20 Aug 2007, Tom Lane wrote:
  I've applied version 0.58 of the patch with a lot of further
  editorializing.  I feel fairly confident now in the code that interfaces

 Great ! Just checked and most things after trivial changes are working !
 We need to summarize changes and provide upgraide guide.

Congratulations Teodor and Oleg !


  Also, we need to decide what to do with contrib/tsearch2, which is
  currently DOA because of conflicts with the new core code.  We could
  either rip it out entirely, or try to convert it into a compatibility
  package.  In view of the renamings of functions we agreed to do, I
  think there is some scope for a compatibility package, but I have no
  time to work on that.

 Probably, we could leave tsearch2 in contrib as a compat module and
 explicitly define 8.4 will be the last release with tsearch2 support.

Are you going to keep tools to compile stemmer from tsearch2 ? (I think about 
those files : 
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/gendict/config.sh
 )

Or just let people use this method : 
http://momjian.us/expire/textsearch/HTML/textsearch-parser-example.html ?


-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Is ALTER TEXT SEARCH CONFIGURATION PARSER = new_parser really sane?

2007-08-22 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 After starting to document this stuff I'm wondering whether it really
 makes sense to change the parser associated with a tsearch
 configuration.  The problem is that the new parser might have an
 unrelated set of token types, but we don't do anything about updating
 the configuration's mappings.

I'm not really up-to-date on all this tsearch stuff. What would happen if you
already had a parser but wanted to fix a bug or add one new feature or
something like that?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 patch status report

2007-08-22 Thread Andrew Dunstan



Tom Lane wrote:

Applied, thanks.  (Hm, I thought we had some buildfarm machines testing
VPATH builds these days?  Guess not ...)


  


I have switched dungbeetle to use vpath. It's a one line config file 
change, and it builds every hour (3 a day for stable banches, remainder 
for HEAD) if there are changes on the relevant branch.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Crash with empty dictionary

2007-08-22 Thread Hamid Quddus Akhtar

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
  

I'm not sure if this is a sane way to set up a dictionary, but surely
seg faulting is not the right thing to do. Should we throw an error on
an empty dict file, or should we swallow it without crashing?



Offhand I'd say that an empty file is a legitimate corner case,
so we should just take it silently.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


  


Shouldn't we be warning about an empty file rather than just swallowing 
up the error? It might not be intentional and rather than the user 
trying to figure it out, we should at least be informing him/her...


--
Hamid


Re: [HACKERS] A couple of tsearch loose ends

2007-08-22 Thread Tom Lane
Dimitri Fontaine [EMAIL PROTECTED] writes:
 I don't understand why this ALTER variation is so different from existing=20
 ones, but maybe the following syntax can't work:
   ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;

You'd have to repeat the whole command for each option to be changed,
which given the amount of typing involved seems a bit unpleasant.

There are also historical differences between what is allowed by
the SET var = value syntax and what is allowed in the
parenthesized-option-list syntax.  Introducing an inconsistency between
ALTER and CREATE doesn't seem appetizing.

(BTW, does anyone want to teach psql's tab-completion about the new
text search statements?)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Crash with empty dictionary

2007-08-22 Thread Tom Lane
Hamid Quddus Akhtar [EMAIL PROTECTED] writes:
 Offhand I'd say that an empty file is a legitimate corner case,
 so we should just take it silently.

 Shouldn't we be warning about an empty file rather than just swallowing 
 up the error?

You are jumping to a conclusion, namely that it is an error.  If it's
a legitimate corner case, throwing a warning every time the file is
read would be incredibly annoying.

If it's not a legitimate case, then we should throw a real error.
A warning just strikes me as the worst of both worlds.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Crash with empty dictionary

2007-08-22 Thread Bruce Momjian
Tom Lane wrote:
 Hamid Quddus Akhtar [EMAIL PROTECTED] writes:
  Offhand I'd say that an empty file is a legitimate corner case,
  so we should just take it silently.
 
  Shouldn't we be warning about an empty file rather than just swallowing 
  up the error?
 
 You are jumping to a conclusion, namely that it is an error.  If it's
 a legitimate corner case, throwing a warning every time the file is
 read would be incredibly annoying.
 
 If it's not a legitimate case, then we should throw a real error.
 A warning just strikes me as the worst of both worlds.

A zero-length file seems fine to me in this case.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] A couple of tsearch loose ends

2007-08-22 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Dimitri Fontaine [EMAIL PROTECTED] writes:

I don't understand why this ALTER variation is so different from existing=20
ones, but maybe the following syntax can't work:
  ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;


You'd have to repeat the whole command for each option to be changed,
which given the amount of typing involved seems a bit unpleasant.

There are also historical differences between what is allowed by
the SET var = value syntax and what is allowed in the
parenthesized-option-list syntax.  Introducing an inconsistency between
ALTER and CREATE doesn't seem appetizing.

(BTW, does anyone want to teach psql's tab-completion about the new
text search statements?)


I will take a stab at doing that ...


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [COMMITTERS] pgsql: Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by

2007-08-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 I also made these statements verify that the tmplinit method will accept
 the new settings before they get stored; in the original coding you didn't
 find out about mistakes until the dictionary got invoked.

 That is source of initdb error with -E KOI8-R --locale ru_RU.KOI8-R options :
 ERROR:  character 0xc3a5 of encoding UTF8 has no equivalent in KOI8

 Snowball dictionary tries to convert swedish (some languages too, but
 that was first which I see) stopword file from UTF8 to KOI8 encoding.

Hmm.  That's a problem, but I don't think that not having any error
checking on CREATE TEXT SEARCH DICTIONARY's parameters is a good
solution.

The first kluge that comes to mind is to suppress the error check in a
standalone backend (ie, when not IsUnderPostmaster), which would cover
the initdb case.  But maybe there are better answers ... any ideas?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Naming of the prefab snowball stemmer dictionaries

2007-08-22 Thread Tom Lane
I notice that the existing tsearch documentation that we've imported
fairly consistently refers to Snowball dictionaries with names like
en_stem, ru_stem, etc.  However, CVS HEAD is set up to create them
with names english, russian, etc.  As I've been absorbing more of
the docs I'm starting to wonder whether this is a good idea.  ISTM
that these names encourage a novice to think that the one dictionary
is all you could need for a given language; and there are enough
examples of more-complex setups in the docs to make it clear that
in fact Snowball is not the be-all and end-all of dictionaries.

I'm thinking that going back to the old naming convention (or something
like it --- maybe english_stem, russian_stem, etc) would be better.
It'd help to give the right impression, namely that these dictionaries
are a component of a solution but not necessarily all you need.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Crash with empty dictionary

2007-08-22 Thread Heikki Linnakangas
Bruce Momjian wrote:
 Tom Lane wrote:
 Hamid Quddus Akhtar [EMAIL PROTECTED] writes:
 Offhand I'd say that an empty file is a legitimate corner case,
 so we should just take it silently.
 Shouldn't we be warning about an empty file rather than just swallowing 
 up the error?
 You are jumping to a conclusion, namely that it is an error.  If it's
 a legitimate corner case, throwing a warning every time the file is
 read would be incredibly annoying.

 If it's not a legitimate case, then we should throw a real error.
 A warning just strikes me as the worst of both worlds.
 
 A zero-length file seems fine to me in this case.

It also seems to have problems with an affix-file with a single entry.

Looking closer at the tmpCtx hack, it looks like it can't just be
replaced by setting CurrentMemoryContext to a temporary context. Some
stuff needs to be allocated in the ts cache entry's dictCtx, while other
stuff is temporary. I'll try to at least comment it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Naming of the prefab snowball stemmer dictionaries

2007-08-22 Thread A.M.


On Aug 22, 2007, at 11:10 , Tom Lane wrote:


I notice that the existing tsearch documentation that we've imported
fairly consistently refers to Snowball dictionaries with names like
en_stem, ru_stem, etc.  However, CVS HEAD is set up to create them
with names english, russian, etc.  As I've been absorbing more of
the docs I'm starting to wonder whether this is a good idea.  ISTM
that these names encourage a novice to think that the one dictionary
is all you could need for a given language; and there are enough
examples of more-complex setups in the docs to make it clear that
in fact Snowball is not the be-all and end-all of dictionaries.

I'm thinking that going back to the old naming convention (or  
something
like it --- maybe english_stem, russian_stem, etc) would be  
better.

It'd help to give the right impression, namely that these dictionaries
are a component of a solution but not necessarily all you need.


Please use ISO 639 codes plus any qualifiers to reduce confusion.
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

-M

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Naming of the prefab snowball stemmer dictionaries

2007-08-22 Thread Zeugswetter Andreas ADI SD

Sounds reasonable, but why exactly did we spell out english instead of en ?
Seems the abbrev is much easier to extract from LANG or browser prefs ...

Andreas  

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Tom Lane
Gesendet: Mittwoch, 22. August 2007 17:11
An: Oleg Bartunov; Teodor Sigaev
Cc: pgsql-hackers@postgreSQL.org
Betreff: [HACKERS] Naming of the prefab snowball stemmer dictionaries 
[bayes][heur]
Wichtigkeit: Niedrig

I notice that the existing tsearch documentation that we've imported fairly 
consistently refers to Snowball dictionaries with names like en_stem, 
ru_stem, etc.  However, CVS HEAD is set up to create them with names 
english, russian, etc.  As I've been absorbing more of the docs I'm 
starting to wonder whether this is a good idea.  ISTM that these names 
encourage a novice to think that the one dictionary is all you could need for a 
given language; and there are enough examples of more-complex setups in the 
docs to make it clear that in fact Snowball is not the be-all and end-all of 
dictionaries.

I'm thinking that going back to the old naming convention (or something like it 
--- maybe english_stem, russian_stem, etc) would be better.
It'd help to give the right impression, namely that these dictionaries are a 
component of a solution but not necessarily all you need.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Re: [COMMITTERS] pgsql: Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by

2007-08-22 Thread Teodor Sigaev

Hmm.  That's a problem, but I don't think that not having any error
checking on CREATE TEXT SEARCH DICTIONARY's parameters is a good
solution.

Agreed



The first kluge that comes to mind is to suppress the error check in a
standalone backend (ie, when not IsUnderPostmaster), which would cover
the initdb case.  But maybe there are better answers ... any ideas?


Don't see, but that's connected only to snowball stemmers - other dictionaries 
are not created in initdb time.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [COMMITTERS] pgsql: Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by

2007-08-22 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 The first kluge that comes to mind is to suppress the error check in a
 standalone backend (ie, when not IsUnderPostmaster), which would cover
 the initdb case.  But maybe there are better answers ... any ideas?

 Don't see, but that's connected only to snowball stemmers - other 
 dictionaries 
 are not created in initdb time.

True today, but might not always be so.  ISTM the generic issue here is
that initdb wants to set up prefab dictionaries that might not actually
be usable in the specific database encoding that template1 is assigned.
But making them anyway is correct, since they could still be used later
in some other database created with a different encoding.  So I'm
inclined to put the disable check into the generic verify_dictoptions()
routine, not into snowball specifically.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch2 patch status report

2007-08-22 Thread Ron Mayer
Merlin Moncure wrote:
 On 8/21/07, Magnus Hagander [EMAIL PROTECTED] wrote:
 OTOH, if we do it as a compat package, we need to set a firm end-date on
 it, so we don't have to maintain it forever. 
 
 I would suggest making a pgfoundry project...that's what was done with
 userlocks.  I'm pretty certain no one besides me has ever used the
 wrappers I created...a lot more people use tsearch2 than userlocks
 though.
 

Hmm..  In that case I'd think people should ask if anyone would use
the tsearch2 compatibility layer before even doing pgfoundry.
Speaking for myself, I expect migrating to the core text search
APIs as something we'd do as part of our 8.3 migration even if
such a compatibility layer existed.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Naming of the prefab snowball stemmer dictionaries

2007-08-22 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 Sounds reasonable, but why exactly did we spell out english instead of en 
 ?
 Seems the abbrev is much easier to extract from LANG or browser prefs ...

Mainly because we're following the upstream snowball project on the
naming.

I don't think that LANG is relevant to this.  If you had an application
that wanted to make a selection based on that, what it'd be trying to
set is a configuration name, not a dictionary name.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] ispell file format

2007-08-22 Thread Heikki Linnakangas
Is the file format for the ispell dictionary documented somewhere?
There's apparently support for an old and a new format, but I can't
figure out what the formats are.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Naming of the prefab snowball stemmer dictionaries

2007-08-22 Thread Oleg Bartunov

On Wed, 22 Aug 2007, Tom Lane wrote:


I notice that the existing tsearch documentation that we've imported
fairly consistently refers to Snowball dictionaries with names like
en_stem, ru_stem, etc.  However, CVS HEAD is set up to create them
with names english, russian, etc.  As I've been absorbing more of
the docs I'm starting to wonder whether this is a good idea.  ISTM
that these names encourage a novice to think that the one dictionary
is all you could need for a given language; and there are enough
examples of more-complex setups in the docs to make it clear that
in fact Snowball is not the be-all and end-all of dictionaries.

I'm thinking that going back to the old naming convention (or something
like it --- maybe english_stem, russian_stem, etc) would be better.
It'd help to give the right impression, namely that these dictionaries
are a component of a solution but not necessarily all you need.

Thoughts?


I agree with you, old naming was more informative.



regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] ispell file format

2007-08-22 Thread Oleg Bartunov

On Wed, 22 Aug 2007, Heikki Linnakangas wrote:


Is the file format for the ispell dictionary documented somewhere?
There's apparently support for an old and a new format, but I can't
figure out what the formats are.


ispell, myspell and hunspell formats are supported automagically.
They are available from openoffice.org


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Power outage on Sun buildfarm machines

2007-08-22 Thread Josh Berkus

Hackers,

We had a massive power outage in the Norway lab.  So please ignore any 
failures from the Sun/Solaris buildfarm servers for the next 48 hours.


--Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Tom Lane
[EMAIL PROTECTED] (Peter Eisentraut) writes:
 Log Message:
 ---
 Add configure option --with-system-tzdata to use operating system time zone
 database.

While this looked like a reasonable idea in the abstract, it turns out
that it was probably a waste of time.  Red Hat distributions, at least,
will never be able to use it in the given form.  Allow me to do a brain
dump for the archives' benefit, before I forget the details.

In the first place, it is considered bad form for a package to install
an absolute symlink to /usr/share/zoneinfo:

symlinks _should_ be relative.  Even if all they have in common is /.
- Jeremy Katz
https://www.redhat.com/archives/fedora-maintainers/2007-August/msg00096.html

Followup arguments in that thread mentioned chroots and NFS mounts as
environments where absolute symlinks are likely to lead to the wrong
place.  While I'm not 100% convinced by those arguments, it's difficult
to go against the advice of people who have far more experience with RPM
packaging than I do.  Fortunately, the standard RPM installation
location for PG's datadir is /usr/share/pgsql, so in practice the
symlink will be ../zoneinfo, which is not so scary as it coulda been.

Unfortunately, configure --with-system-tzdata=../zoneinfo ain't
gonna work; the build will not get through make check, because that
symlink won't work in a temporary installation.  And trying to
relativize the symlink on-the-fly during make install is no answer
either, since in an RPM build that step is normally done with a nonempty
DESTDIR prefix.

The other truly nasty thing, which I just wasted most of this afternoon
learning the hard way, is that RPM does not like at all to replace a
directory with a symlink, and don't hold your breath for a fix:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=61413
I experimented with various ways to get around that, and mostly
succeeded in wiping out my /usr/share/zoneinfo :-( It seems that the
fundamental problem is that RPM doesn't remove files that have vanished
from a package until quite late in an update cycle, and is therefore
capable of removing things indirectly through the already-installed
symlink, while thinking that it is cleaning out the contents of the
former subdirectory.  My ultimate conclusion was that the only
reasonably safe way to do things is to make the symlink be named
something different from the original subdirectory.
https://www.redhat.com/archives/fedora-devel-list/2007-August/msg01622.html

This might be a problem specific to RPM, but I wouldn't really care to
bet on other package managers being entirely free of it.  My advice
to anyone else thinking of using a symlink to a system directory is to
name it something different than the existing PG subdirectory.

In short, then, the patch actually being used (as of today) in Fedora
and ultimately RHEL is as below, and I don't see any prospect of
substituting the mechanism Peter has created.

regards, tom lane

--

PostgreSQL includes its own copy of the zic timezone database, which is great
for ensuring portable results but not so great from a package maintenance
perspective.  Since the data is in the same format as is provided by the
Linux-standard /usr/share/zoneinfo files, we can avoid having to update
postgresql for timezone updates by just symlinking to those files.

It is allegedly desirable for the link in question to be a relative symlink.
I have strong doubts about this, not least because it requires the horrid
install-time kluge seen below --- we can't use a simple relative symlink when
making the temporary installation used for make check, since that will be at
an indeterminate location compared to /usr/share.  The actual relative link
also depends fundamentally upon knowing where the PG datadir will get
installed, namely /usr/share/pgsql.

And if you thought that was bad, it turns out that RPM has some fundamental
bugs that make it difficult or impossible to replace a directory with a
symlink during RPM upgrade.  Rather than risk getting caught in that quagmire,
we choose to name the symlink something different than its predecessor
subdirectory.  (This part of the hack could perhaps get undone someday, when
there is no longer any danger of someone trying to rpm-upgrade from an
installation that isn't patched this way.)


diff -Naur postgresql-8.2.4.orig/src/timezone/Makefile 
postgresql-8.2.4/src/timezone/Makefile
--- postgresql-8.2.4.orig/src/timezone/Makefile 2007-03-14 13:38:15.0 
-0400
+++ postgresql-8.2.4/src/timezone/Makefile  2007-08-22 16:57:41.0 
-0400
@@ -38,14 +38,18 @@
$(CC) $(CFLAGS) $(ZICOBJS) $(LDFLAGS) $(LIBS) -o [EMAIL PROTECTED](X)
 
 install: all installdirs
-   ./zic -d '$(DESTDIR)$(datadir)/timezone' -p '$(POSIXRULES)' 
$(TZDATAFILES)
+   if [ x'$(DESTDIR)' = x`echo '$(DESTDIR)' | sed 's,tmp_check/install,,'` 
] ; then \
+   ln -s '../zoneinfo' '$(DESTDIR)$(datadir)/zoneinfo' ; \
+   else \

Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 In the first place, it is considered bad form for a package to install
 an absolute symlink to /usr/share/zoneinfo:

 symlinks _should_ be relative.  Even if all they have in common is /.
 - Jeremy Katz
 https://www.redhat.com/archives/fedora-maintainers/2007-August/msg00096.html

 Followup arguments in that thread mentioned chroots and NFS mounts as
 environments where absolute symlinks are likely to lead to the wrong
 place.  

Fwiw Debian also faced this issue and came to a different conclusion. IIRC the
policy is that it's explicitly supported for a sysadmin to replace any
directory in the top level directory with a symlink. So for example /home -
/export/home.

Therefore any package which includes a symlink which traverses between top
level directories *must* be absolute. And any symlink which does not span two
top level directories *must* be relative. So /usr/foo/bar which links to
/var/foo/bar must be absolute. But /usr/foo/bar which links to /usr/qux/bar
must be a relative link ../qux/bar.

 In short, then, the patch actually being used (as of today) in Fedora
 and ultimately RHEL is as below, and I don't see any prospect of
 substituting the mechanism Peter has created.

Why would --with-zoneinfo want to use a symlink though? Shouldn't it just
compile the binary to use the path specified directly? Symlinks are fine for a
sysadmin or a packager but if it's going to be supported by Postgres code
directly why not do it directly?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.

1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?  For instance suppose that I have an orders table, and
one of the fields is userid.  The following is unambiguous and is
legal in Oracle:

  SELECT order_count
, count(*) as people
  FROM (
  SELECT count(*) as order_count
  FROM orders
  GROUP BY userid
)
  GROUP BY order_count

It annoys me that it isn't legal in postgres.  (Yes, I know how to fix
the query.  But it still is an annoyance, and it comes up fairly often
in reporting purposes.)

2. Why is 'non-integer constant in GROUP BY' an error?

I find it inconvenient.  For reporting purposes I often have to
dynamically build queries in code.  An easy way to do that is just
interpolate in a set of possible statements which will either be empty
strings or have trailing commas.  But then I need this (possibly
empty) list to have a valid group by statement at the end.  In Oracle
I used to just write it like this:

  SELECT ...
  GROUP BY $possible_term_1
$possible_term_2
$possible_term_3
'end of possible groupings'

In postgres I either have to use a different strategy to build up
these strings, or else use a more complicated term to finish that off.
 This works for now:

  case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.

3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of ...as foo clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.

It would be nice if I could just make the group by look like the
select, and have the (obviously irrelevant) aliases just be ignored.

4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like.  Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example

  SELECT foo, count(*)
  FROM bar

would be processed as:

  SELECT foo, count(*)
  FROM bar
  GROUP BY foo

If I write a query with an aggregate function in the select, better
than 95% of the time this is the group by clause that I want.  (This
email has one of the few exceptions.)  In the remaining cases I could
easily add the extra stuff in the group by to the select without
problems.  Therefore if postgres could just insert the obvious group
by clause in, I would never again write the words group by when
working with postgres.  And I predict that many other people would do
the same.

But it doesn't.  So when working with postgres, just like every other
database that I've used, I have to constantly type in group by clauses
with entirely redundant information.  (But they're not EXACTLY the
same as the select clauses that they are redundant with...)

Cheers,
Ben

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Peter Eisentraut
Tom Lane wrote:
 Red Hat distributions, at least, will never be able to use it in the
 given form.

This just encoded what a packager would presumably have done anyway.  If 
it's not going to work for Red Hat, then it's never going to work for 
Red Hat in any way without major reeingineering (such as compiling in 
the path rather than using a symlink).  If someone feels inclined to do 
that work, then we can evaluate that, but we can keep using the same 
configure option, so everything would keep working consistently.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Peter Eisentraut
Gregory Stark wrote:
 Why would --with-zoneinfo want to use a symlink though? Shouldn't it
 just compile the binary to use the path specified directly?

The way this question is posed appears to imply that doing the latter is 
easier, but it's not.  If someone wants to do the extra work, maybe 
that would help support more cases.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Josh Berkus
Ben,

pgsql-sql is probably the appropriate list for future queries of this 
nature.

Note that the below is my personal opinion; each PG developer has their 
own.

 1. Just a minor annoyance, but why must subqueries in FROM clauses
 have an alias?  For instance suppose that I have an orders table, and
 one of the fields is userid.  The following is unambiguous and is
 legal in Oracle:

I *think* the alias is a requirement of the SQL standard.  Yes/No?

 2. Why is 'non-integer constant in GROUP BY' an error?

Again, this needs to reference one of the SQL standards if you're 
interested in a change of behavior.  If we're out of standards compliance, 
that's a strong argument.  If we're in compliance, you have a pretty steep 
hurdle to justify new syntax.

 3. How hard would it be to have postgres ignore aliases in group by
 clauses? 

Unfortunately, I think this is also a SQL compliance issue.  However, I'd 
be more liable to support your arguments for it; it's much more obviously 
useful functionality.

 4) Items 2 and 3 would both be made irrelevant if postgres did
 something that I'd really, really would like.  Which is to assume that
 a query without a group by clause, but with an aggregate function in
 the select, should have an implicit group by clause where you group by
 all non-aggregate functions in the select.

In addition to SQL compliance issues, we're reluctant to do anything which 
makes implicit assumptions which could easily be wrong in PostgreSQL.  
Such shortcutting all to often leads to runaway queries or wrong data when 
the assumptions are incorrect.  MySQL gives us lots of examples of what 
can happen if you do too many things for convenience and compromise 
correctness.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 18:45 , Ben Tilly wrote:


1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?


It's required by the SQL standard, AIUI. I wonder what EnterpriseDB  
does?



2. Why is 'non-integer constant in GROUP BY' an error?



 This works for now:

  case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.


The PostgreSQL developers generally tries hard to preserve backwards  
compatibility, so I doubt the case expression as you have it would go  
away (though I'm kind of surprised it's allowed). Am I wrong in  
thinking that Oracle would accept the same format PostgreSQL does? In  
that case, couldn't you use whatever method works in PostgreSQL in  
Oracle? I haven't checked the SQL standard, but it seems unlikely  
it'd allow something like


GROUP BY , , , ;

AIUI, Integers are only allowed because the SQL standard explicitly  
allows you to refer to columns by the order they appear in the SELECT  
list. Otherwise the GROUP BY items need to be column names.


Both 1 and 2 seem to me to be places where Oracle is likely deviating  
from the standard. If you're targeting Oracle, then using Oracle- 
specific syntax might be warranted. If you're hoping to target more  
than one possible backend, I'd think it be better to use more  
portable syntax (e.g., SQL-standard syntax) than expecting other  
DBMSs to follow another's deviations. That's not to say PostgreSQL  
does not have non-standard syntax: in places, it does. But it does  
try to hew very closely to the standard.


Again, I wonder what EnterpriseDB does in this case?


3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of ...as foo clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.


Perhaps someone else knows what you're referring to here, but I'm  
having a hard time without an example. Here's what I *think* you're  
trying to say:


test=# select * from observation;
observation_id | record_id | score_id
+---+--
   3240 | 1 |1
   3239 | 1 |1
   3238 | 1 |2
   3237 | 1 |1
   2872 | 2 |1
   2869 | 2 |2
   2870 | 2 |1
   2871 | 2 |1
   3218 | 3 |2
   3217 | 3 |1
(10 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by record_id;

foo | count
-+---
   3 | 2
   2 | 4
   1 | 4
(3 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by foo;

foo | count
-+---
   3 | 2
   2 | 4
   1 | 4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by foo;

foo | bar
-+-
   3 |   2
   2 |   4
   1 |   4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by record_id;

foo | bar
-+-
   3 |   2
   2 |   4
   1 |   4
(3 rows)

test=# select version();

version
 
--
PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5367)

(1 row)

I'm not getting an error in any permutation that I can think of. What  
am I missing?



Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example

  SELECT foo, count(*)
  FROM bar

would be processed as:

  SELECT foo, count(*)
  FROM bar
  GROUP BY foo


It's been discussed before. I don't believe it's been rejected out of  
hand (though you can check the archives), just that no one's gotten  
around to it. (Don't know what the SQL-spec says on this point.)


I'm not trying to dismiss your points, just trying to address them.  
I'm interested to hear what others have to say.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Gregory Stark

Ben Tilly [EMAIL PROTECTED] writes:

 Hopefully this is the right place for a few feature requests that
 would address some of the things that I've noticed in postgres.

 1. Just a minor annoyance, but why must subqueries in FROM clauses
 have an alias?  For instance suppose that I have an orders table, and
 one of the fields is userid.  The following is unambiguous and is
 legal in Oracle:

Thank you, this is one of my top pet peeves but when I proposed changing it I
was told nobody's complained. Now we have at least one user complaint, any
others out there?

 2. Why is 'non-integer constant in GROUP BY' an error?

Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.

 3. How hard would it be to have postgres ignore aliases in group by
 clauses?  

That sounds like a strange idea.

 4) Items 2 and 3 would both be made irrelevant if postgres did
 something that I'd really, really would like.  Which is to assume that
 a query without a group by clause, but with an aggregate function in
 the select, should have an implicit group by clause where you group by
 all non-aggregate functions in the select.

 For example

   SELECT foo, count(*)
   FROM bar

 would be processed as:

   SELECT foo, count(*)
   FROM bar
   GROUP BY foo

I agree this would be convenient but it seems too scary to actually go
anywhere. What would you group by in the case of:

SELECT a+b, count(*) FROM bar

Should it group by a,b or a+b ?

Also, this might be a bit shocking for MySQL users who are accustomed to
MySQL's non-standard extension for the same syntax. There it's treated as an
assertion that the columns are equal for all records in a group or at least
that it doesn't matter which such value is returned, effectively equivalent to
our DISTINCT ON feature.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 Why would --with-zoneinfo want to use a symlink though? Shouldn't it
 just compile the binary to use the path specified directly?

 The way this question is posed appears to imply that doing the latter is 
 easier, but it's not.  If someone wants to do the extra work, maybe 
 that would help support more cases.

AFAICS it would support fewer cases.  Hardwiring /usr/share/zoneinfo
into the binary is isomorphic to having it use a symlink defined that
way, as far as the objections to absolute symlinks go.  And once it's
hardwired you can't change it without recompiling, which means no go
for hacks like the one I did involving supplying a different symlink
for make check.

The larger point of my post was really that there are enough
packaging-specific considerations involved in doing this that
--with-system-tzdata is just not going to be a widely useful solution.
My advice is to take it out, not make it more baroque.  You won't
be doing packagers any favors by turning the code they have to patch
into a moving target.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:

 On Aug 22, 2007, at 18:45 , Ben Tilly wrote:

  1. Just a minor annoyance, but why must subqueries in FROM clauses
  have an alias?

 It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
 does?

I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.

I have no clue what EnterpriseDB does.

  2. Why is 'non-integer constant in GROUP BY' an error?

   This works for now:
 
case when true then true end
 
  but I don't know whether some future version of postgres might break
  my code by banning that as well.

 The PostgreSQL developers generally tries hard to preserve backwards
 compatibility, so I doubt the case expression as you have it would go
 away (though I'm kind of surprised it's allowed). Am I wrong in
 thinking that Oracle would accept the same format PostgreSQL does? In

The reason for my comparing to Oracle is that I used to work at an
Oracle shop.  I now work at a postgres shop.  Portability is not my
issue, just the annoyances that I experienced moving from one to the
other.

As for whether that case expression would go away, that it is allowed
is such an obscure feature that I doubt anyone changing that code
would notice if it was removed.

 that case, couldn't you use whatever method works in PostgreSQL in
 Oracle? I haven't checked the SQL standard, but it seems unlikely
 it'd allow something like

 GROUP BY , , , ;

That's not what Oracle accepts  that postgres does not.  What Oracle accepts is:

  ...
  GROUP BY 'foo';

 AIUI, Integers are only allowed because the SQL standard explicitly
 allows you to refer to columns by the order they appear in the SELECT
 list. Otherwise the GROUP BY items need to be column names.

Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres - allows that.

The standard very wisely does not forbid extensions.  Every database
has extensions.  In some cases, such as allowing trim(foo.bar) in a
group by clause, some extensions are so common as to be a standard.
(I don't have a copy of any later standards so I don't know whether
that has since been explicitly allowed.)  Therefore the real question
is how much farther than the standard you go.

Postgres explicitly disallows a constant character expression.  But it
allows the constant case expression that I gave.  It would be nice for
me to not have to remember that very obscure and convoluted case.

 Both 1 and 2 seem to me to be places where Oracle is likely deviating
 from the standard. If you're targeting Oracle, then using Oracle-
 specific syntax might be warranted. If you're hoping to target more
 than one possible backend, I'd think it be better to use more
 portable syntax (e.g., SQL-standard syntax) than expecting other
 DBMSs to follow another's deviations. That's not to say PostgreSQL
 does not have non-standard syntax: in places, it does. But it does
 try to hew very closely to the standard.

The queries that I'm writing are not hoping to target more than one
database at one company.

 Again, I wonder what EnterpriseDB does in this case?

No clue.

  3. How hard would it be to have postgres ignore aliases in group by
  clauses?  Per my comments above, I often build complex queries in
  code.  I can't easily use the shortcut of referring to the select
  column by number because the position is hard to determine.  So my
  code has to copy the select terms.  But I can't copy them exactly
  because the select terms include lots of ...as foo clauses that are
  not allowed in a group by.  So I have to store very similar terms to
  use twice.

 Perhaps someone else knows what you're referring to here, but I'm
 having a hard time without an example. Here's what I *think* you're
 trying to say:

What I'm trying to say is that it would be convenient for me to be
able to write:

  select bar as baz
, count(*) as some count
  from foo
  group by bar as baz

That's not allowed right now because as is not allowed in a group by statement.

[...]
  Which is to assume that
  a query without a group by clause, but with an aggregate function in
  the select, should have an implicit group by clause where you group by
  all non-aggregate functions in the select.
 
  For example
 
SELECT foo, count(*)
FROM bar
 
  would be processed as:
 
SELECT foo, count(*)
FROM bar
GROUP BY foo

 It's been discussed before. I don't believe it's been rejected out of
 hand (though you can check the archives), just that no one's gotten
 around to it. (Don't know what the SQL-spec says on this point.)

I don't know what the SQL spec says, but I know (having talked 

Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Ben Tilly [EMAIL PROTECTED] writes:

  Hopefully this is the right place for a few feature requests that
  would address some of the things that I've noticed in postgres.
 
  1. Just a minor annoyance, but why must subqueries in FROM clauses
  have an alias?  For instance suppose that I have an orders table, and
  one of the fields is userid.  The following is unambiguous and is
  legal in Oracle:

 Thank you, this is one of my top pet peeves but when I proposed changing it I
 was told nobody's complained. Now we have at least one user complaint, any
 others out there?

Always happy to complain. :-)

  2. Why is 'non-integer constant in GROUP BY' an error?

 Hm... I was a bit surprised by this warning myself. IIRC there was an
 implementation convenience issue.

If your implementation accepts:

  group by case when true then 'foo' end

how much harder can it be to accept:

  group by 'foo'

?

  3. How hard would it be to have postgres ignore aliases in group by
  clauses?

 That sounds like a strange idea.

It is a strange idea, but it makes dynamically building queries
easier.  Right now I'm following a strategy of storing what I'm going
to insert in the select clause in one variable, and the group by
clause in another.  So I need 2 variables for each dynamic field that
I might choose to group by and want to have a custom name for.  With
this change I would only need one variable.

  4) Items 2 and 3 would both be made irrelevant if postgres did
  something that I'd really, really would like.  Which is to assume that
  a query without a group by clause, but with an aggregate function in
  the select, should have an implicit group by clause where you group by
  all non-aggregate functions in the select.
 
  For example
 
SELECT foo, count(*)
FROM bar
 
  would be processed as:
 
SELECT foo, count(*)
FROM bar
GROUP BY foo

 I agree this would be convenient but it seems too scary to actually go
 anywhere. What would you group by in the case of:

 SELECT a+b, count(*) FROM bar

 Should it group by a,b or a+b ?

It should group by a+b.  Which is to say, every field in the select
clause that currently triggers an error because it isn't in the group
by clause.

 Also, this might be a bit shocking for MySQL users who are accustomed to
 MySQL's non-standard extension for the same syntax. There it's treated as an
 assertion that the columns are equal for all records in a group or at least
 that it doesn't matter which such value is returned, effectively equivalent to
 our DISTINCT ON feature.

I don't mind shocking MySQL users. ;-)

But seriously, if that objection is the barrier then I'd be happy to
see it be something that is explicitly turned on in the query.  For
instance:

  select autogroup bar, count(*) from foo

If that was available then I for one would type autogroup a lot more
often than group by.  After all autogroup is about as hard to type,
and I don't have to type the redundant list of fields in the group by.

Cheers,
Ben

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Florian G. Pflug

Ben Tilly wrote:

On 8/22/07, Gregory Stark [EMAIL PROTECTED] wrote:

Ben Tilly [EMAIL PROTECTED] writes:

2. Why is 'non-integer constant in GROUP BY' an error?

Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.


If your implementation accepts:

  group by case when true then 'foo' end

how much harder can it be to accept:

  group by 'foo'


This is not about hardness of the implementation, but rather about
non-confusing behaviour I think.

AFAIK, group by 1 means group by the first selected column, not
group all rows together. But group by 'foo' would carry the second
meaning - group all rows together. This is so totally counter-intuitive,
that it's not even funny...

group by case when true then 'foo' end looks different enough compared to
group by 1 to make this less of a footgun.

Seems that the group by integer syntax predates the appearance of
aliases in the sql standard...

greetings, Florian flug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 20:36 , Ben Tilly wrote:


On 8/22/07, Michael Glaesemann [EMAIL PROTECTED] wrote:


On Aug 22, 2007, at 18:45 , Ben Tilly wrote:


1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?


It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?


I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.


If I'm reading my draft copy of the SQL:2003 spec right (and there's  
a good chance that I'm not, as it's not the easiest document for me  
to parse), aliases *are* required.


From 5WD-02-Foundation-2003-09 (the fifth working draft)


7.5 from clause

from clause ::= FROM table reference list
table reference list ::=
table reference [ { comma table reference }... ]




7.6 table reference

table reference ::=
table factor
  | joined table




table factor ::= table primary [ sample clause ]




table primary ::=
table or query name [ [ AS ] correlation name
[ left paren derived column listright paren ] ]
  | derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | lateral derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | collection derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | table function derived table [ AS ] correlation name
[ left paren derived column listright paren ]
  | only spec [ [ AS ] correlation name
[ left paren derived column listright paren ] ]
  | left paren joined table right paren



derived table ::= table subquery


correlation name I believe is what we commonly refer to as an  
alias. I think table or query name is a table or view name, and  
derived table is a subquery. correlation name is optional for  
table or query name but not for derived table. The fact that the  
standard clearly makes it optional in some cases and required in  
others is pretty clear that derived table without a correlation  
name is to be rejected, don't you think?



I have no clue what EnterpriseDB does.


In case it wasn't clear, the reason I bring it up is that  
EnterpriseDB, while working from a PostgreSQL base, strives for  
Oracle compatibility.



that case, couldn't you use whatever method works in PostgreSQL in
Oracle? I haven't checked the SQL standard, but it seems unlikely
it'd allow something like

GROUP BY , , , ;


That's not what Oracle accepts  that postgres does not.  What  
Oracle accepts is:


  ...
  GROUP BY 'foo';


Thanks for clarifying. If it wasn't obvious by now, you know I'm not  
familiar with Oracle :)



AIUI, Integers are only allowed because the SQL standard explicitly
allows you to refer to columns by the order they appear in the SELECT
list. Otherwise the GROUP BY items need to be column names.


Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.


The 2003 draft (same as above) seems to agree with the SQL92 standard:


7.9 group by clause



Format
group by clause ::=
GROUPBY [ set quantifier ] grouping element list
grouping element list ::=
grouping element [ { comma grouping element }... ]
grouping element ::=
ordinary grouping set
  | rollup list
  | cube list
  | grouping sets specification
  | empty grouping set
ordinary grouping set ::=
grouping column reference
  | left paren grouping column reference listright paren
grouping column reference ::=
column reference [ collate clause ]
grouping column reference list ::=
grouping column reference [ { commagrouping column  
reference }... ]

rollup list ::=
ROLLUPleft paren ordinary grouping set listright paren
ordinary grouping set list ::=
ordinary grouping set [ { comma ordinary grouping set }... ]
cube list ::=
CUBEleft paren ordinary grouping set listright paren
grouping sets specification ::=
GROUPINGSETS left paren grouping set listright paren
grouping set list ::=
grouping set [ { comma grouping set }... ]
grouping set ::=
ordinary grouping set
  | rollup list
  | cube list
  | grouping sets specification
  | empty grouping set




empty grouping set ::= left parenright paren



6.7 column reference



Format
column reference ::=
basic identifier chain
  | MODULEperiod qualified identifierperiod column name


There'd have to be a pretty strong reason to extend this, more than  
just a convenience, I should think.



  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres -  
allows that.


Can you give an example of something like this working in PostgreSQL?  
I get an error when I try to use a text value in a GROUP BY clause.  
(Or are you referring specifically to the CASE expression corner case?)


test=# select 

Re: [HACKERS] SQL feature requests

2007-08-22 Thread Michael Glaesemann


On Aug 22, 2007, at 20:49 , Ben Tilly wrote:


If your implementation accepts:

  group by case when true then 'foo' end


What would that mean? Regardless of whether or not it's accepted, it  
should have *some* meaning.


It's not equivalent to GROUP BY foo

test=# select record_id as foo
, count(observation_id) as bar
from observation
group by case when true
  then 'foo'
 end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function

test=# select record_id
   , count(observation_id) as bar
   from observation
   group by case when true
 then 'record_id'
end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function


*This* seems like a bug:
test=# select record_id
   , count(observation_id) as bar
   from observation
   group by record_id
   , case when true
  then 'foo'
 end;
record_id | bar
---+-
 1 |   4
 2 |   4
 3 |   2
(3 rows)

And for good measure:

test=# select record_id
   , count(observation_id) as bar
   from observation
   group by case when true
 then record_id
end;
ERROR:  column observation.record_id must appear in the GROUP BY  
clause or be used in an aggregate function


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [COMMITTERS] pgsql: Add configure option --with-system-tzdata to use operating system

2007-08-22 Thread Tom Lane
[ catching up on today's email ]

Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 In the first place, it is considered bad form for a package to install
 an absolute symlink to /usr/share/zoneinfo:

 Fwiw Debian also faced this issue and came to a different conclusion.

That's fair enough --- as I mentioned, I wasn't 100% convinced either.
But those are the rules for RHEL/Fedora packages and I'm gonna play
by them.

The bottom line here is that accessing a system copy of the zic database
is going to be so system-dependent that I'm not sure Peter's patch will
be useful to anybody.

 Why would --with-zoneinfo want to use a symlink though? Shouldn't it just
 compile the binary to use the path specified directly?

AFAICS that just moves the problem to a different place, one where an
admin *can't* fix it without recompiling ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Tom Lane
Michael Glaesemann [EMAIL PROTECTED] writes:
 On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
 I can well believe that the standard says that you must accept
 subqueries with aliases.  But does it say you must reject subqueries
 without aliases?  I strongly doubt that.

 If I'm reading my draft copy of the SQL:2003 spec right (and there's
 a good chance that I'm not, as it's not the easiest document for me
 to parse), aliases *are* required.

Yes, they are.  It's been like that since SQL92 if not before (I've
never seen a copy of SQL89).  I was meaning to go look up whether
SQL2003 had gotten any laxer, but thanks for saving me the trouble.

The question for us is whether we should exceed the spec by allowing
something it does not.  The fundamental reason why not is that we would
have to invent an alias for the unnamed sub-select.  That has several
drawbacks: more code to try to pick an unused alias, more bug reports
from people whose queries conflicted with whatever we picked.  In return
for which, we're providing absolutely zip real increase in
functionality, and we're encouraging people to write unportable
SQL-spec-incompatible code.  There is a future-proofing argument too:
if the committee ever decides this is a good idea (which may be unlikely
after 15 years, but you never know), they could well define the default
alias in a different way than we had done, and then we are stuck behind
the eight ball.

One could also assume that if the SQL committee has not made this
oh-so-obvious improvement, there is some darn good reason why not.
I'm not privy to their minutes so I don't know what arguments were
made there, but I think we can eliminate they never thought of it
as an explanation.  Three generations of the spec have been written
specifically to exclude this.

In short, lots of downsides here, and not a whole lot of upside.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match