Re: [HACKERS] text search vs schemas

2007-08-17 Thread Martijn van Oosterhout
On Fri, Aug 17, 2007 at 01:16:22AM -0400, Tom Lane wrote:
 That seems like it'd fix the problem for expression indexes on
 to_tsvector calls, but I don't see how it fixes the problem for
 triggers.  We don't have any clear path for making trigger arguments
 be anything but a list of strings.

I'm unsure how it works now, but it seems reasonable that when a
regclass/regtype/regetc is passed to a trigger, pass it in OID form.
This can be cast back safely inside the trigger itself. Seems a little
hacky though...

Having it as a type would also help with tracking dependancies.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-17 Thread Magnus Hagander
Marc G. Fournier wrote:
 
 
 --On Thursday, August 16, 2007 23:16:09 +0200 Magnus Hagander 
 [EMAIL PROTECTED] wrote:
 
 But my question still stands - how much work to stop-gap fix it on the
 old one?
 
 rsync should be upgraded now ...

Thanks!
Hopefully that should fix the short-term problem.

I'll try to take a look at the other one as soon as I can, hopefully
this weekend - if you have the docs by then.

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] text search vs schemas

2007-08-17 Thread Peter Eisentraut
Am Freitag, 17. August 2007 05:15 schrieb Tom Lane:
 Actually ... I'm suddenly not happy about the choice to put text search
 configurations etc. into schemas at all.  We've been sitting here and
 assuming that to_tsvector('english', my_text_col) has a well defined
 meaning --- but as the patch stands, *it does not*.  The interpretation
 of the config name could easily change depending on search_path.

But that isn't different from any other part of the system.  A proper fix 
would be a mechanism to alleviate the confusion in all places, not simply to 
remove features that cause such confusion in some places (but not all, 
thereby causing inconsistencies).

 It does not seem likely that a typical installation will have so many
 text search configs that subdividing them into schemas will really be
 useful.

But schemas are not only used to organize objects because there are so many.  
Altering the search path to get at a different implementation without having 
to alter the names in every single place is also a useful application.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] GIT patch

2007-08-17 Thread Heikki Linnakangas
Bruce Momjian wrote:
 These patches will be held for 8.4:
 
   o  Grouped Index Tuples (GIT)
   o  Bitmap scan changes
   o  Stream bitmaps (API change for Group Index Tuples)
   o  Maintaining cluster order on insert
 
 I believe Heikki is in agreement on this.

Yes.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch patch and namespace pollution

2007-08-17 Thread Michael Paesold

Bruce Momjian wrote:

I would be happy if all text search functions began with 'ts', 'ts_', or
'to_ts', and if we don't clean this up now, it is going to be harder in
the future.


+1 from me. \df is also much more useful then.

 I think users can expect some migration for text search in

8.3 as a benefit of getting into core and be dump-able.


I guess so. Especially if you change some functions, they will have to 
change source code anyway. So you can as well cleanup all functions that 
don't fit into a sound naming schema.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] pg_ctl configurable timeout

2007-08-17 Thread Peter Eisentraut
I'm having trouble with the hardcoded 60 second timeout in pg_ctl.  pg_ctl 
sometimes just times out and there is no way to make it wait a little longer.  
I would like to add an option to be able to change that, say 
pg_ctl -w --timeout=120.  Comments?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] text search vs schemas

2007-08-17 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes:
 On Fri, Aug 17, 2007 at 01:16:22AM -0400, Tom Lane wrote:
 That seems like it'd fix the problem for expression indexes on
 to_tsvector calls, but I don't see how it fixes the problem for
 triggers.  We don't have any clear path for making trigger arguments
 be anything but a list of strings.

 I'm unsure how it works now, but it seems reasonable that when a
 regclass/regtype/regetc is passed to a trigger, pass it in OID form.

If you insist on a solution that involves attaching type information
to trigger arguments, then we can forget about getting tsearch into 8.3.
That's a nontrivial amount of new design and code that hasn't even been
on the radar screen before.

At the moment I feel our thoughts have to revolve not around adding
complexity to tsearch, but taking stuff out.  If we ship it with no
schema support for TS objects in 8.3, we can always add that later,
if there proves to be real demand for that (and I note that the contrib
version has gotten along fine without it).  But we cannot go in the
other direction.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Robert Treat
On Thursday 16 August 2007 15:58, Bruce Momjian wrote:
 Josh Berkus wrote:
  All,
 
  First off, I'll assert that backup/restore is a serious issue and while
  the folks who want Tsearch in core now are dismissing it, we'll be
  fielding the complaints later.  Any solution which involves setting a GUC
  at restore time *which could vary per table or even column* isn't
  acceptable.  We used to do the \SET thing for table ownership with
  backup/restore, and you *know* how many restore failures that caused.

 Agreed.  Let me summarize where we are now.  I talked to Tom on the
 phone yesterday so we have come up with the following plan:

   o  default_text_search_config stays, not super-user-only, not set
  in pg_dump output
   o  tsearch functions that don't have a configuration name will be
  marked so they can't be specified in expression indexes
   o  auto-casts  and :: to tsearch data types will also not work in
  expression indexes (we already do this for timestamp without
  timezone)
   o  GIN on an text column will not promote to tsvector
   o  No rewrite magic for function calls without configuration names in
  WHERE clauses to use indexes that do specify configurations (risky)

 The current documentation explains all this:

   http://momjian.us/expire/textsearch/HTML/textsearch-tables.html

 So, we have disabled the ability to create expression indexes that are
 affected by default_text_search_config, and we have documented other
 possible problems.   tsvector_update_trigger() has to be modified to
 take a configuration name (and frankly I am not excited about the
 filter_name capability either, but that is a separate issue).

 The only remaining problem I see is that the rest of the documentation
 relies heavily on default_text_search_config when in fact the most
 common usage with tables and indexes can't use it.  tsquery can use the
 default easily, but I am betting that tsvector usually cannot.

What exactly does default_text_search_config buy us?  I think it is supposed 
to simplify things, but it sounds like it adds a bunch of corner cases, 
special siutations, if's and but's (and candies and nuts), that I fear will 
lead to more confusion for end users, and make development more difficult in 
the future as we forced to try and live with backwards compatability.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian
Robert Treat wrote:
  The only remaining problem I see is that the rest of the documentation
  relies heavily on default_text_search_config when in fact the most
  common usage with tables and indexes can't use it.  tsquery can use the
  default easily, but I am betting that tsvector usually cannot.
 
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 special siutations, if's and but's (and candies and nuts), that I fear will 
 lead to more confusion for end users, and make development more difficult in 
 the future as we forced to try and live with backwards compatability.  

Agreed.  That was my conclusion long ago but few agreed so I gave up.

In fairness the goal was for default_text_search_config to make text
search easier for clusters that use a single configuration.  If you are
using triggers on a separate tsvector column, only the trigger author
needs to deal with the configuration name (not queries), but expression
indexes require the configuration name to always be used for the
tsvector queries, while the tsquery can use the
default_text_search_config value.  Anyway, again, it is all
special-casing this and that, as you said.  And, if you are specifying
the configuration name for the tsvector but not the tsquery you are more
likely to have a configuration mismatch.  (Of course you might want
different configurations for tsvector and tsquery, but that is for
experts.)

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] text search vs schemas

2007-08-17 Thread Martijn van Oosterhout
On Fri, Aug 17, 2007 at 10:42:09AM -0400, Tom Lane wrote:
 Martijn van Oosterhout [EMAIL PROTECTED] writes:
  I'm unsure how it works now, but it seems reasonable that when a
  regclass/regtype/regetc is passed to a trigger, pass it in OID form.
 
 If you insist on a solution that involves attaching type information
 to trigger arguments, then we can forget about getting tsearch into 8.3.
 That's a nontrivial amount of new design and code that hasn't even been
 on the radar screen before.

Hmm, maybe I didn't explain clearly enough. I meant that if the
argument is a regclass for example, to pass it in the TG_ARGV list as
the OID in *string form*.

That way trigger arguments stay a list of strings, yet the whole thing
is schema safe because when trigger body casts the string back to a
regclass, it gets exactly what was passed.

Hope this makes more sense,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 On Thu, 16 Aug 2007, Josh Berkus wrote:
 First off, I'll assert that backup/restore is a serious issue and while the
 folks who want Tsearch in core now are dismissing it, we'll be fielding the
 complaints later.  Any solution which involves setting a GUC at restore time
 *which could vary per table or even column* isn't acceptable.

 Josh, all my respects to you, but text searching is not about index at all.
 Text searching is about tsvector and tsquery data type

What's your point?  The problem is just as bad for an auto-update
trigger that computes a stored tsvector column.  If the trigger's
behavior depends on the GUC settings of the person doing an insert,
things will soon be a mess --- do you really want the tsvector contents
to change after an update of an unrelated field?  After awhile you
won't have any idea what's really in the column, because you won't
have any good way to know which rows' tsvectors were generated with
which configurations.

Even if that state of affairs is really what you want, reproducing
it after a dump/reload will be tricky.  I think that a regular
schema-and-data dump would work, because pg_dump doesn't install
triggers until after it's loaded the data ... but a data-only dump
would *not* work, because the trigger would fire while loading rows.

Basically I see no use for a setup in which the configuration used
for a particular tsvector value is not fully determined by the table
definition.  Whether the value is in an index or in the table proper
is irrelevant to this argument.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Luke Lonergan
All - we have customers who very much want tsearch2 and will benefit from its 
inclusion in core.

We are also struggling with the update trigger approach for various reasons.

Is there a good alternative?  Can we embed tsvector updates into the core code 
efficiently?

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]
Sent:   Friday, August 17, 2007 11:28 AM Eastern Standard Time
To: Oleg Bartunov
Cc: Josh Berkus; pgsql-hackers@postgresql.org
Subject:Re: [HACKERS] tsearch2 in PostgreSQL 8.3? 

Oleg Bartunov [EMAIL PROTECTED] writes:
 On Thu, 16 Aug 2007, Josh Berkus wrote:
 First off, I'll assert that backup/restore is a serious issue and while the
 folks who want Tsearch in core now are dismissing it, we'll be fielding the
 complaints later.  Any solution which involves setting a GUC at restore time
 *which could vary per table or even column* isn't acceptable.

 Josh, all my respects to you, but text searching is not about index at all.
 Text searching is about tsvector and tsquery data type

What's your point?  The problem is just as bad for an auto-update
trigger that computes a stored tsvector column.  If the trigger's
behavior depends on the GUC settings of the person doing an insert,
things will soon be a mess --- do you really want the tsvector contents
to change after an update of an unrelated field?  After awhile you
won't have any idea what's really in the column, because you won't
have any good way to know which rows' tsvectors were generated with
which configurations.

Even if that state of affairs is really what you want, reproducing
it after a dump/reload will be tricky.  I think that a regular
schema-and-data dump would work, because pg_dump doesn't install
triggers until after it's loaded the data ... but a data-only dump
would *not* work, because the trigger would fire while loading rows.

Basically I see no use for a setup in which the configuration used
for a particular tsvector value is not fully determined by the table
definition.  Whether the value is in an index or in the table proper
is irrelevant to this argument.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] text search vs schemas

2007-08-17 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Freitag, 17. August 2007 05:15 schrieb Tom Lane:
 Actually ... I'm suddenly not happy about the choice to put text search
 configurations etc. into schemas at all.

 But that isn't different from any other part of the system.  A proper fix 
 would be a mechanism to alleviate the confusion in all places, not simply to 
 remove features that cause such confusion in some places (but not all, 
 thereby causing inconsistencies).

Well, we are already inconsistent about this.  PL languages and index
access methods, for example, don't have schema-ified names.

 It does not seem likely that a typical installation will have so many
 text search configs that subdividing them into schemas will really be
 useful.

 But schemas are not only used to organize objects because there are so many.
 Altering the search path to get at a different implementation without having 
 to alter the names in every single place is also a useful application.

This is isomorphic to the argument about whether default_text_search_config
is a good idea; indeed, I think that default_text_search_config pretty
much solves this problem already, for the places where it's sane to have
the configuration-in-use depend upon context.  The problem with using
schemas for TS configs is that we can't prevent the search result from
changing in contexts where it mustn't change.  At least, not short of
requiring fully-qualified config names in those places, which doesn't
sound like an advance in usability.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 

Well, the main thing we'd lose if we remove it is all trace of upward
compatibility from the contrib version of tsearch.  People are
accustomed to using query functions that rely on a default configuration
setting.  Even though I want to prohibit use of a default in the
definition of an index or auto-update trigger, I don't see a good reason
to forbid it in queries.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] text search vs schemas

2007-08-17 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes:
 On Fri, Aug 17, 2007 at 10:42:09AM -0400, Tom Lane wrote:
 If you insist on a solution that involves attaching type information
 to trigger arguments, then we can forget about getting tsearch into 8.3.

 Hmm, maybe I didn't explain clearly enough. I meant that if the
 argument is a regclass for example, to pass it in the TG_ARGV list as
 the OID in *string form*.

Are you expecting the *user* to deal with that?  If not, how is the
system supposed to know which trigger arguments to do it to?  What
about dump and reload?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian
Luke Lonergan wrote:
 All - we have customers who very much want tsearch2 and will benefit from its 
 inclusion in core.
 
 We are also struggling with the update trigger approach for various reasons.
 
 Is there a good alternative?  Can we embed tsvector updates into the core 
 code efficiently?

No, doing it automatically adds too much complexity for little benefit. 
If you want more concrete suggestions, you will have to provide more
information about the problems you are having.

---


 
 - Luke
 
 Msg is shrt cuz m on ma treo
 
  -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 17, 2007 11:28 AM Eastern Standard Time
 To:   Oleg Bartunov
 Cc:   Josh Berkus; pgsql-hackers@postgresql.org
 Subject:  Re: [HACKERS] tsearch2 in PostgreSQL 8.3? 
 
 Oleg Bartunov [EMAIL PROTECTED] writes:
  On Thu, 16 Aug 2007, Josh Berkus wrote:
  First off, I'll assert that backup/restore is a serious issue and while the
  folks who want Tsearch in core now are dismissing it, we'll be fielding the
  complaints later.  Any solution which involves setting a GUC at restore 
  time
  *which could vary per table or even column* isn't acceptable.
 
  Josh, all my respects to you, but text searching is not about index at all.
  Text searching is about tsvector and tsquery data type
 
 What's your point?  The problem is just as bad for an auto-update
 trigger that computes a stored tsvector column.  If the trigger's
 behavior depends on the GUC settings of the person doing an insert,
 things will soon be a mess --- do you really want the tsvector contents
 to change after an update of an unrelated field?  After awhile you
 won't have any idea what's really in the column, because you won't
 have any good way to know which rows' tsvectors were generated with
 which configurations.
 
 Even if that state of affairs is really what you want, reproducing
 it after a dump/reload will be tricky.  I think that a regular
 schema-and-data dump would work, because pg_dump doesn't install
 triggers until after it's loaded the data ... but a data-only dump
 would *not* work, because the trigger would fire while loading rows.
 
 Basically I see no use for a setup in which the configuration used
 for a particular tsvector value is not fully determined by the table
 definition.  Whether the value is in an index or in the table proper
 is irrelevant to this argument.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.

 I don't think this is all that big of a deal. In fact I would expect it
 going from contrib to core and never had any illusion to the effect that
 I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

I would hope that what we do with contrib/tsearch2 is rewrite it as a
compatibility wrapper.  This at least will provide an answer to anyone
who complains that we renamed the functions.  But if there are
fundamental things missing in the core implementation, and we try to
make the wrapper supply them, then we haven't really eliminated the
problem ... just moved it over a little.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian
Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
  Here's something not to forget in this whole business: the present TSearch2
  implementation permits you to have a different tsvector configuration for 
  each *row*, not just each column.  That is, applications can be built with
  per-cell configs.
 
 Certainly.  That's actually the easiest case to deal with, because you're
 going to put the tsvector config identity into another column of the
 table, and the trigger or index just references it there.  It hasn't
 been part of the discussion because it's not a problem.

I added an example of that in the documentation (second query):


http://momjian.us/expire/textsearch/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] tsearch still has external configuration files

2007-08-17 Thread Tom Lane
A couple months ago I wrote:
 Lastly, I'm unhappy that the patch still keeps a lot of configuration
 information, such as stop word lists, in the filesystem rather than the
 database.  It seems to me that the single easiest and most useful part
 of a configuration to change is the stop word list; but this setup
 guarantees that no one but a DBA can do that, and what's more that
 pg_dump won't record your changes.  What's the point of having any
 non-superuser configuration capability at all, if stop words aren't part
 of what you can change?

It appears that nothing has been done about this objection in the
current patch.  It is too late to redesign stop word handling for 8.3,
but right now I have a more limited complaint: the patch allows
unprivileged users to specify stopword files with absolute paths.
This is a serious security breach, since it allows unprivileged users
to read arbitrary files with the permissions of the postgres user.
Now they maybe would have some difficulty determining the exact contents
of such a file, but it would certainly be easy to test for the existence
of particular words in it.

What I think we should do about this is the same as we do for timezone
abbreviation sets: the user-given stopword specification is just a name,
which we insist can't contain dots or directory separators, and then
we look up $SHAREDIR/dict_data/NAME.stop (or other suffixes for the
other kinds of configuration files).  This closes the security hole
and also gives us a chance at an upward-compatible redesign later ---
for instance, in a future release the name might refer to an entry in
some other system catalog, rather than a file.

BTW, I'm inclined to rename the installation directory to
$SHAREDIR/tsearch_data/ ... any objections to that?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus
Folks,

Here's something not to forget in this whole business: the present TSearch2 
implementation permits you to have a different tsvector configuration for 
each *row*, not just each column.  That is, applications can be built with 
per-cell configs.

I know of at least one out there: Ubuntu's Rosetta.  I'm sure there are 
others.

Therefore there are two cases we're trying to solve:

(1) The simple case: someone wants to build a database with text search 
entirely in one UTF8 language.  All vectors are in that language, and so are 
all queries.  The user wants the simplest syntax possible.

(2) The Rosetta case: different configs are used for each cell and all 
searches have to be language-qualified.

In both cases, the databases need to backup and restore cleanly.

From this, I'd first of all say that I don't see the point of a Superuser 
default_tsvector_search_config.  There are too many failure conditions with 
the default once you get away from the simplest case, so I don't see how 
setting it to Superuser-only protects anything.  Might as well make it a 
userset and then it will be more useful.

Unfortunately, the way I see it the only permanent solution for this is to 
alter the TSvector structure to include a config OID at the beginning of it.  
That doesn't sound like it's doable in time for 8.3, though; is there a way 
we could work around that until 8.4?

And why does this sound exactly like the issues we've had with per-column 
encodings and the currency type?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-17 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, August 17, 2007 08:40:11 +0200 Magnus Hagander 
[EMAIL PROTECTED] wrote:

 Marc G. Fournier wrote:


 --On Thursday, August 16, 2007 23:16:09 +0200 Magnus Hagander
 [EMAIL PROTECTED] wrote:

 But my question still stands - how much work to stop-gap fix it on the
 old one?

 rsync should be upgraded now ...

 Thanks!
 Hopefully that should fix the short-term problem.

 I'll try to take a look at the other one as soon as I can, hopefully
 this weekend - if you have the docs by then.

If you need information, just ask for it ...


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGxdqt4QvfyHIvDvMRAps7AKCOeK/Nnl+QHP6s4dowwueVlJKCKgCgpdGV
mmvsY+qa7gszdye6ftAc++4=
=5WQB
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.

I don't think this is all that big of a deal. In fact I would expect it
going from contrib to core and never had any illusion to the effect that
I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

Sincerely,

Joshua D. Drake



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxdYTATb/zqfZUUQRAo6gAJ9JDNGdTvYopOdw0Dp7rknffEZqewCaAkR9
d4EmQLv6iMpZ/iWR8Ksy1Ek=
=aEft
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Here's something not to forget in this whole business: the present TSearch2
 implementation permits you to have a different tsvector configuration for 
 each *row*, not just each column.  That is, applications can be built with
 per-cell configs.

Certainly.  That's actually the easiest case to deal with, because you're
going to put the tsvector config identity into another column of the
table, and the trigger or index just references it there.  It hasn't
been part of the discussion because it's not a problem.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] pgparam extension to the libpq api

2007-08-17 Thread Merlin Moncure
Attached are some new functions that extend the libpq api to make
calling the parameterized interfaces easier, especially when making
binary calls.  IMO, this fills one of the two big missing parts of the
libpq api when making binary database calls, the other being client
side handling of complex structures (arrays, etc).

The code covers two major areas of functionality and isolated for
separate inclusion:
* PGparam (param.c)
* get/set functions for the pgresult (result_ext.c)

We are happy with both pieces but they can be adopted separately or not at all.

The attached code is basically a cleaned up version of wrappers put in
place in our own applications, plus a functional test.  The major
ideas were:

* introduce a new opaque structure, PGparam, that handles some of the
more difficult aspects of memory management associated with binary
calls.
* remove the requirement of client side code having to do byte swapping
* make binary calls as efficient as possible, with a minimal amount of
memory allocations
* introduce, as much as possible, no additional portability issues or
additional dependencies to the libpq api.

Here are the interesting and/or possibly controversial pieces:
* For portability purposes, we had the 64 bit integer put function
take a pointer where the other putters take value types.  We couldn't
think of any other way to do it because there is not 64 bit portable
integer type in libpq.
* The synchronous execution functions (for example PQparamExec), takes
a pointer to a result and return error status, which is _not_ how the
other flavors of Exec operate, but is very convenient however.  If you
pass in NULL the result is discarded for you.  We are stuck on this
approach, but we like it.
* The getters check the returned type oid to make sure it is sane.
For this reason, we have to include catalog/pg_type.h and postgres.h
to get to the OID defines (these are not exposed to the interface
however).  I don't see a reason why this is not ok.

The 64 bit integer is handled as a pointer in the get/set functions
because as far as we can tell there is no 64 bit integer type we can
count on without introducing compatibility issues.

We considered putting the PGparam struct into the PGconn structure.
In this case, a PGconn pointer would be passed to the PQparamXXX
functions instead of a PGparam, and would lazy allocate the structure
and free it on PQfinish.  We are curious for opinions on this.

Writing credits to myself and Andrew Chernow.  If this proposal is
accepted, we will write all the documentation and make suitable
changes necessary for inclusion, presumably for the 8.4 release.  To
compile the changes see the attached makefile.

What we would really like is to use the backend input and output
functions for data types, rather than reimplementing this within the
client ... ie pqformat.c and similar files.  For this reason, we did
not re-implement get/put functions for the geometric types (we thought
about it), etc.  Merging the client and the server marshaling may
require some abstraction of the server so formatting functions can be
called from the client api.

Hopefully this will open up the binary interfaces to more developers.
For certain types of queries, binary calls can be a huge win in terms
of efficiency.

merlin


makefile
Description: Binary data

#include stdlib.h
#include string.h
#include pg.h
#include libpq-int.h

/* Supports 250 columns worth of params.  If more are needed,
 * memory is allocated ... very rare case.
 */
#define COLSTACKSIZE 4096

#define CHKPARAMPTR(p) do{ \
	if(!(p)) \
	{ \
		errno = EINVAL; \
		strcpy((p)-errmsg, libpq_gettext(PGparam pointer is NULL)); \
		return 0; \
	} \
}while(0)

#define PARAM_ARRAY_DECL \
	char _stackbuffer[COLSTACKSIZE]; \
	char *buf   = _stackbuffer; \
  char **vals = NULL; \
	int *lens   = NULL; \
	int *fmts   = NULL

#define PARAM_ARRAY_ASSIGN do{ \
	if(param) \
	{ \
		int n = (int)((sizeof(void *) * param-vcnt) + \
			((sizeof(int) * 2) * param-vcnt)); \
		if(n  COLSTACKSIZE) \
		{ \
			buf = (char *)malloc(n); \
			if(!buf) \
			{ \
printfPQExpBuffer(conn-errorMessage, \
	libpq_gettext(cannot allocate parameter column arrays\n)); \
return 0; \
			} \
		} \
		vals = (char **)buf; \
		lens = (int *)(buf + (sizeof(void *) * param-vcnt)); \
		fmts = lens + param-vcnt; \
	  for(n=0; n  param-vcnt; n++) \
	  { \
	vals[n] = param-vals[n].data; \
	lens[n] = param-vals[n].datal; \
	fmts[n] = param-vals[n].format; \
	  }	\
	} \
}while(0)

#define PARAM_ARRAY_FREE do{ \
	if(buf != _stackbuffer) \
		free(buf); \
}while(0)

typedef struct
{
	int ptrl;
  void *ptr;
	int datal;
  char *data;
  int format;
} PGvalue;

struct pg_param
{
  int vcnt;
  int vmax;
  PGvalue *vals;
	int slabsize;
	char *slab;
	char errmsg[128];
};


PGparam *PQparamCreate(void)
{
	return (PGparam *)calloc(1, sizeof(PGparam));
}

void PQparamReset(PGparam *param)
{
	if(param)
		param-vcnt = 0;
}

char *PQparamErrorMessage(PGparam *param)
{
	

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Bruce Momjian
Josh Berkus wrote:
 Folks,
 
 Here's something not to forget in this whole business: the present TSearch2 
 implementation permits you to have a different tsvector configuration for 
 each *row*, not just each column.  That is, applications can be built with 
 per-cell configs.
 
 I know of at least one out there: Ubuntu's Rosetta.  I'm sure there are 
 others.
 
 Therefore there are two cases we're trying to solve:
 
 (1) The simple case: someone wants to build a database with text search 
 entirely in one UTF8 language.  All vectors are in that language, and so are 
 all queries.  The user wants the simplest syntax possible.
 
 (2) The Rosetta case: different configs are used for each cell and all 
 searches have to be language-qualified.
 
 In both cases, the databases need to backup and restore cleanly.
 
 From this, I'd first of all say that I don't see the point of a Superuser 
 default_tsvector_search_config.  There are too many failure conditions with 
 the default once you get away from the simplest case, so I don't see how 
 setting it to Superuser-only protects anything.  Might as well make it a 
 userset and then it will be more useful.

Per my email yesterday, default_tsvector_search_config is _not_
super-user-only:

  o  default_text_search_config stays, not super-user-only, not set
 in pg_dump output

 Unfortunately, the way I see it the only permanent solution for this is to 
 alter the TSvector structure to include a config OID at the beginning of it.  
 That doesn't sound like it's doable in time for 8.3, though; is there a way 
 we could work around that until 8.4?

Oh, so you want the config inside each tsvector value.  Interesting
idea.

 And why does this sound exactly like the issues we've had with per-column 
 encodings and the currency type?

Yes, this is a very similar issue except we are trying to allow multiple
encodings.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] tsearch still has external configuration files

2007-08-17 Thread Bruce Momjian
Tom Lane wrote:
 A couple months ago I wrote:
  Lastly, I'm unhappy that the patch still keeps a lot of configuration
  information, such as stop word lists, in the filesystem rather than the
  database.  It seems to me that the single easiest and most useful part
  of a configuration to change is the stop word list; but this setup
  guarantees that no one but a DBA can do that, and what's more that
  pg_dump won't record your changes.  What's the point of having any
  non-superuser configuration capability at all, if stop words aren't part
  of what you can change?
 
 It appears that nothing has been done about this objection in the
 current patch.  It is too late to redesign stop word handling for 8.3,

Yes, I thought we agreed that for 8.3 we would use external files with
UTF8 encoding.

 BTW, I'm inclined to rename the installation directory to
 $SHAREDIR/tsearch_data/ ... any objections to that?

Seems clear to me.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] pgparam extension to the libpq api

2007-08-17 Thread Merlin Moncure
On 8/17/07, Merlin Moncure [EMAIL PROTECTED] wrote:
 Attached are some new functions that extend the libpq api to make

after sending the mail, we noticed some dead code that might be
confusing...in PQparamClear there was some legacy code referring to
'slab' which has no effect...ignore.  Also slab and slabsize members
of PGparam are not supposed to be there.

 * The synchronous execution functions (for example PQparamExec), takes
a pointer to a result and return error status, which is _not_ how the
other flavors of Exec operate, but is very convenient however.  If you
pass in NULL the result is discarded for you.  We are stuck on this
approach, but we like it.

Also, we are _not_ stuck in the **PGresult concept :-). (typo)

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-17 Thread Dave Page

Marc G. Fournier wrote:

I'll try to take a look at the other one as soon as I can, hopefully
this weekend - if you have the docs by then.


If you need information, just ask for it ...


Magnus has repeatedly asked you to document it on PMT as we have been 
doing as a matter of course for everything for quite some time now.


We're not doing that to be a pita, but to help us run the kind of 
professional infrastructure that the community has come to expect. That 
means everything is documented, systems are built in standard ways 
whereever possible, everything is monitored constantly, and backed up 
left right and center.


So please help maintain that level of professionalism and document the 
new VM you've built so it can be properly maintained in the future.


Regards, Dave.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] A small rant about coding style for backend functions

2007-08-17 Thread Tom Lane
I don't want to pick on Teodor in particular, because I've seen other
people do this too, but which of these functions do you find more
readable?

Datum
to_tsquery_byname(PG_FUNCTION_ARGS)
{
PG_RETURN_DATUM(DirectFunctionCall2(
to_tsquery_byid,
  ObjectIdGetDatum(name2cfgId((text *) PG_GETARG_POINTER(0), false)),
PG_GETARG_DATUM(1)
));
}

Datum
to_tsquery_byname(PG_FUNCTION_ARGS)
{
text   *cfgname = PG_GETARG_TEXT_P(0);
text   *txt = PG_GETARG_TEXT_P(1);
Oid cfgId;

cfgId = name2cfgId(cfgname, false);
PG_RETURN_DATUM(DirectFunctionCall2(to_tsquery_byid,
ObjectIdGetDatum(cfgId),
PointerGetDatum(txt)));
}

The main drawback to the V1-call-convention function call mechanism,
compared to ordinary C functions, is that you can't instantly see what
the function arguments are supposed to be.  I think that good coding
style demands ameliorating this by declaring and extracting all the
arguments at the top of the function.  The above example is bad enough,
but when you have to dig through a function of many lines looking for
GETARG calls in order to know what arguments it expects, it's seriously
annoying and unreadable.

And another thing: use the correct extraction macro for the argument's
type, rather than making something up on the fly.  Quite aside from
helping the reader see what the function expects, the first example
above is actually *wrong*, as it will crash on toasted input.

OK, I'm done venting ... back to patch-fixing.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-17 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dave Page wrote:
 Marc G. Fournier wrote:
 I'll try to take a look at the other one as soon as I can, hopefully
 this weekend - if you have the docs by then.

 If you need information, just ask for it ...
 
 Magnus has repeatedly asked you to document it on PMT as we have been
 doing as a matter of course for everything for quite some time now.

+1, even I am following up with the team standard of trying to document
on PMT.

 
 We're not doing that to be a pita, but to help us run the kind of
 professional infrastructure that the community has come to expect. That
 means everything is documented, systems are built in standard ways
 whereever possible, everything is monitored constantly, and backed up
 left right and center.

+1

 
 So please help maintain that level of professionalism and document the
 new VM you've built so it can be properly maintained in the future.

As our infrastructure continues to grow this is going to be vital. We
keep getting bigger, and the only way to track this stuff appropriately
is through documentation.

Sincerely,

Joshua D. Drake


 
 Regards, Dave.
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
http://www.postgresql.org/about/donate
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxf4mATb/zqfZUUQRAsgQAJ4i/4j9a94WFHK2i1Xe/mA1yWi4gQCffQAI
AsOsgbdFuqvYjLpFZRpby34=
=FErs
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Problem with locks

2007-08-17 Thread Alvaro Herrera
Gregory Stark wrote:

 I switched the code over to the sysv_sema style api. It's gotten a bit grotty
 and I would clean it up if it weren't a temporary test program. If we find a
 real problem perhaps I should add a test case like this to the smoke test in
 ipc_test.c so people can check their OS. 

So did you discover anything?  I ran your test program and it worked
successfully for several different configurations.  Not enough times
maybe, though.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] More logging for autovacuum

2007-08-17 Thread Alvaro Herrera
Gregory Stark wrote:
 
 I'm having trouble following what's going on with autovacuum and I'm finding
 the existing logging insufficient. In particular that it's only logging vacuum
 runs *after* the vacuum finishes makes it hard to see what vacuums are running
 at any given time. Also, I want to see what is making autovacuum decide to
 forgo vacuuming a table and the log with that information is at DEBUG2.

So did this idea go anywhere?

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, Gee, Officer Krupke)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Ron Mayer
Joshua D. Drake wrote:
 Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
 What exactly does default_text_search_config buy us?  I think it is 
 supposed 
 to simplify things, but it sounds like it adds a bunch of corner cases, 
 Well, the main thing we'd lose if we remove it is all trace of upward
 compatibility from the contrib version of tsearch.
 
 I don't think this is all that big of a deal. In fact I would expect it
 going from contrib to core and never had any illusion to the effect that
 I would be able to just upgrade from 8.2 (8.1) Tsearch2 to 8.3.

FWIW, I also would _not_ have expected compatibility between contrib
and core.   In fact, I would have expected contrib tsearch to be a
place where experimental APIs existed and that the single
biggest difference between contrib vs core was that the
core APIs removed any cruft that might have been in contrib.

If default_text_search_config makes things more confusing or more
fragile, I'd rather see it gone than kept around for
backward-compatibility-to-pre-core reasons.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus
Bruce,

 Oh, so you want the config inside each tsvector value.  Interesting
 idea.

Yeah, hasn't anyone suggested this before?  It seems like the obvious 
solution.  A TSvector constructed with en_US is NOT the same as a vector 
constructed with fr_FR and it's silly to pretend that they are comparable.  
Sticking the config name at the beginning of the field would allow for the 
use of single-parameter functions, and default_config would only be used 
for SELECT queries.  Backup/restore issues should go away completely ...

EXCEPT this would introduce issues if the config is changed or deleted 
after being used.  However, I'd imagine that we have those anyway -- 
certainly we would at restore time.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Josh Berkus
Tom,

 It might be an obvious solution, but to some other problem than the one
 we have.  The problem we are trying to address is how to know which
 config to use to construct a *new* tsvector.

Oh, right.  Back to the circular arguments then ...

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Oh, so you want the config inside each tsvector value. Interesting
 idea.

 Yeah, hasn't anyone suggested this before?  It seems like the obvious
 solution.

It might be an obvious solution, but to some other problem than the one
we have.  The problem we are trying to address is how to know which
config to use to construct a *new* tsvector.

 A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable.

Um, actually I think Oleg and Teodor believe that they *are* comparable.
If we try to force them not to be then we'll break multi-language
situations.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] A small rant about coding style for backend functions

2007-08-17 Thread Brendan Jurd
On 8/18/07, Tom Lane [EMAIL PROTECTED] wrote:
 The main drawback to the V1-call-convention function call mechanism,
 compared to ordinary C functions, is that you can't instantly see what
 the function arguments are supposed to be.  I think that good coding
 style demands ameliorating this by declaring and extracting all the
 arguments at the top of the function.  The above example is bad enough,
 but when you have to dig through a function of many lines looking for
 GETARG calls in order to know what arguments it expects, it's seriously
 annoying and unreadable.

 And another thing: use the correct extraction macro for the argument's
 type, rather than making something up on the fly.  Quite aside from
 helping the reader see what the function expects, the first example
 above is actually *wrong*, as it will crash on toasted input.

This is all useful guidance.  My question is why it's not part of the
developer documentation.  Which brings me around to a minor rant of my
own.

All the developer FAQ has to say about coding style is that we use
4-space tabs for indentation, and that you should merge seamlessly
into the surrounding code.  That isn't much solace when the
surrounding code is itself nigh unreadable or doesn't contain examples
of what you are trying to do.

For postgres hacking newbies (such as myself), the lack of any obvious
published coding standards for the project is daunting, and is bound
to lead to those developers filling in the blanks with their own
coding style biases.  Which means the patch reviewers need to spend
time pointing out the flaws, and the submitter needs to spend time
adjusting, testing and resubmitting ... it's all quite avoidable.

I humbly suggest that if the sort of valuable information posted by
Tom here was documented instead of ranted to the mailing list, maybe
you guys wouldn't have to do so much ranting =)

Cheers
BJ

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
 Bruce,
 
  Oh, so you want the config inside each tsvector value.  Interesting
  idea.
 
 Yeah, hasn't anyone suggested this before?  It seems like the obvious 
 solution.  A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable.  

Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?

Otherwise your proposal makes the most sense...

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGxm+DBcgs9XrR2kYRAn7RAJ4u508XQB/W6fMTmTchizlsvKEkEwCfTtTK
R0DMLqNil2VQolFBWE69ZU0=
=Tvh/
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Problem with locks

2007-08-17 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 Gregory Stark wrote:

 I switched the code over to the sysv_sema style api. It's gotten a bit grotty
 and I would clean it up if it weren't a temporary test program. If we find a
 real problem perhaps I should add a test case like this to the smoke test 
 in
 ipc_test.c so people can check their OS. 

 So did you discover anything?  I ran your test program and it worked
 successfully for several different configurations.  Not enough times
 maybe, though.

I haven't been able to find any kernel problem which would explain the
timeouts. The test program seems to work fine on all the machines I've tested
it on except one where it turned up seemingly unrelated (and far worse)
problems.

But looking over the old test results from other machines I can see occasional
transaction response times which exactly match the deadlock_timeout even
though there should be no deadlocks. Apparently this happens with older
releases of Postgres too.

So I am fairly stumped here. There's really no way I can see where we would
have the deadlock signal handler firing, not doing anything, but causing a
semaphore wait to return.

I've updated the kernel and will be running more benchmarks with the updated
kernel next week. But I don't expect the results to change.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Mike Rylander
On 8/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
  Bruce,
 
   Oh, so you want the config inside each tsvector value. Interesting
   idea.
 
  Yeah, hasn't anyone suggested this before?  It seems like the obvious
  solution.  A TSvector constructed with en_US is NOT the same as a vector
  constructed with fr_FR and it's silly to pretend that they are comparable.

 Except that (as I understand Oleg) it even seems to make sense sometimes
 to compare a tsvectors constructed with different configs -- so it might
 be important not to prevent this use case eihter. Oleg?

Configs are not simply about languages, they are also about stopword
lists and stemmers and parsers, and there's no reason to think that
one would be using only one configuration to create a single tsvector.

Different fields from within one document may require different
treatment.  Take for instance title, with stopwords included, and
body, with them removed.  Those two initial tsvectors can then be
concatenated together with different weights to provide a very rich,
and simple (relatively speaking) search infrastructure.

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov

On Sat, 18 Aug 2007, [EMAIL PROTECTED] wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:
 Bruce,
 
  Oh, so you want the config inside each tsvector value. б═Interesting

  idea.
 
 Yeah, hasn't anyone suggested this before?  It seems like the obvious 
 solution.  A TSvector constructed with en_US is NOT the same as a vector 
 constructed with fr_FR and it's silly to pretend that they are comparable. 


Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?


yes, for example, you have tsvectors obtained from different sources, which
require different processing.



Otherwise your proposal makes the most sense...

Regards
- -- tomц║s
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGxm+DBcgs9XrR2kYRAn7RAJ4u508XQB/W6fMTmTchizlsvKEkEwCfTtTK
R0DMLqNil2VQolFBWE69ZU0=
=Tvh/
-END PGP SIGNATURE-



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov

Tom and Bruce, what version of patch you're using ?
Bruce complained about using OID in arguments of functions, but 
AFAIR, it was removed in 0.58 version of patch.


I and Teodor are very busy and just can't follow all discussions, so
we have to rely on people's wisdom. If we have so many problem with 
integration, that probably we could just integrate support of data types

(tsquery, tsvector), index support for them and set of support functions
like to_tsquery, to_tsvector and leave everything remaining in 
contrib/tsearch2 as an example of text search engine design. 
Then, after fixing design problem as well as some backend's issues we could

come with much better conclusions.

Oleg
On Fri, 17 Aug 2007, Tom Lane wrote:


Josh Berkus [EMAIL PROTECTED] writes:

Oh, so you want the config inside each tsvector value. Interesting
idea.



Yeah, hasn't anyone suggested this before?  It seems like the obvious
solution.


It might be an obvious solution, but to some other problem than the one
we have.  The problem we are trying to address is how to know which
config to use to construct a *new* tsvector.


A TSvector constructed with en_US is NOT the same as a vector
constructed with fr_FR and it's silly to pretend that they are comparable.


Um, actually I think Oleg and Teodor believe that they *are* comparable.
If we try to force them not to be then we'll break multi-language
situations.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-17 Thread Oleg Bartunov

On Sat, 18 Aug 2007, Mike Rylander wrote:


On 8/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Aug 17, 2007 at 04:06:15PM -0700, Josh Berkus wrote:

Bruce,


Oh, so you want the config inside each tsvector value. Interesting
idea.


Yeah, hasn't anyone suggested this before?  It seems like the obvious
solution.  A TSvector constructed with en_US is NOT the same as a vector
constructed with fr_FR and it's silly to pretend that they are comparable.


Except that (as I understand Oleg) it even seems to make sense sometimes
to compare a tsvectors constructed with different configs -- so it might
be important not to prevent this use case eihter. Oleg?


Configs are not simply about languages, they are also about stopword
lists and stemmers and parsers, and there's no reason to think that
one would be using only one configuration to create a single tsvector.

Different fields from within one document may require different
treatment.  Take for instance title, with stopwords included, and
body, with them removed.  Those two initial tsvectors can then be
concatenated together with different weights to provide a very rich,
and simple (relatively speaking) search infrastructure.


I can't say better, Mike !

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings