date:20070814

Re: [HACKERS] HOT pgbench results

2007-08-14 Thread ITAGAKI Takahiro

Heikki Linnakangas [EMAIL PROTECTED] wrote:

 I ran some CPU intensive pgbench tests on HOT. Results are not
 surprising, HOT makes practically no difference on the total transaction
 rate, but reduces the need to vacuum:
 
 unpatched HOT
 tps 3680  3790
 WAL written(MB) 5386  4804
 checkpoints 109
 autovacuums 116   43
 autoanalyzes139   60

I also ran pgbench with/without HOT using a bit different configurations
(pgbench -s10 -c10 -t50). There were 10% performance win on HOT,
although the test was CPU intensive and with FILLFACTOR=100%.

unpatched HOT
tps 3366  3634
WAL written(MB) 4969  4374
checkpoints 9 8
autovacuums 126   42
autoanalyzes146   59


I gathered oprofile logs. There were 4 HOT-related functions, that didn't
appear in the unpatched test. But it is probably not so serious.
 - heap_page_prune   1.84%
 - PageRepairFragmentation   0.94%
 - pg_qsort  0.44% (called from PageRepairFragmentation)

On the other hand, the number of _bt_compare and _bt_checkkeys were
reduced by HOT, because we avoid the most part of index insertions.
It looks like LWLockAcquire/Release were also reduced, but I cannot
assure it is a benefits of HOT or a measurement deviation.

unpatched HOT % symbol name
4.08674.2314AllocSetAlloc
2.78392.8022base_yyparse
  1.8392heap_page_prune
1.84591.6659SearchCatCache
1.74051.6087MemoryContextAllocZeroAligned
1.69361.5743hash_search_with_hash_value
1.06721.1822base_yylex
1.24301.1570XLogInsert
  0.9356PageRepairFragmentation
1.35490.8911LWLockAcquire
1.09770.8663LWLockRelease
0.80180.7284nocachegetattr
0.75680.7124FunctionCall2
0.52640.6795ScanKeywordLookup
0.71150.6462hash_any
0.73990.5963AllocSetFree
0.66500.5925GetSnapshotData
0.55360.5789MemoryContextAlloc
0.56430.5547hash_seq_search
0.46600.5005expression_tree_walker
0.52930.4777ExecInitExpr
  0.4381pg_qsort
0.43760.4321hash_uint32
0.41600.4268expression_tree_mutator
0.43220.4183LockAcquire
0.69330.3911_bt_compare
0.52700.3828PinBuffer
0.40250.3798fmgr_info_cxt_security
0.44580.3758MemoryContextAllocZero
0.5101  _bt_checkkeys

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] HOT pgbench results

2007-08-14 Thread Heikki Linnakangas

Thanks for the testing,

ITAGAKI Takahiro wrote:
 I gathered oprofile logs. There were 4 HOT-related functions, that didn't
 appear in the unpatched test. But it is probably not so serious.
  - heap_page_prune   1.84%
  - PageRepairFragmentation   0.94%
  - pg_qsort  0.44% (called from PageRepairFragmentation)

That's expected. Those functions are involved in removing the dead HOT
tuples, replacing VACUUMs. Maybe we could make them cheaper, but it's
not too bad as it is.

 On the other hand, the number of _bt_compare and _bt_checkkeys were
 reduced by HOT, because we avoid the most part of index insertions.
 It looks like LWLockAcquire/Release were also reduced, but I cannot
 assure it is a benefits of HOT or a measurement deviation.

It could very well be real. Because of the reduction of index
insertions, there's less locking of the index pages.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Pavan Deolasee

On 8/9/07, Tom Lane [EMAIL PROTECTED] wrote:



 Yeah, we could simply insist on no change to any column that's used by
 any of the expressions.  That would be cheap to test.



I am trying to figure out the best way to extract this information. Is there
any
existing code to get all attributes used in the expressions ? Or do I need
to walk the tree and extract that information ?

Thanks,
Pavan

Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Pavan Deolasee

On 8/9/07, Simon Riggs [EMAIL PROTECTED] wrote:

 On Thu, 2007-08-09 at 15:46 +0530, Pavan Deolasee wrote:
 

  What if we just track the amount of potentially dead space in the
  relation
  (somebody had suggested that earlier in the thread) ? Every committed
  UPDATE/DELETE and aborted UPDATE/INSERT would increment
  the dead space. Whenever page fragmentation is repaired, either during
  normal operation or during vacuum, the dead space is reduced by the
  amount of reclaimed space. Autovacuum triggers whenever the percentage
  of dead space increases beyond a threshold.
 
  We can some fine tuning to track the space consumed by redirect-dead
  line pointers.

 Sounds great.



So do we have consensus here ? Fortunately, I think there won't be any
changes to user interface. Users can still use the vacuum_scale_factor to
tune autovacuum, but instead of percentage of dead tuples, it would
signify percentage of dead space in the relation.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Simon Riggs

On Tue, 2007-08-14 at 13:24 +0530, Pavan Deolasee wrote:
 
 
 On 8/9/07, Simon Riggs [EMAIL PROTECTED] wrote:
 On Thu, 2007-08-09 at 15:46 +0530, Pavan Deolasee wrote:
 
 
  What if we just track the amount of potentially dead space
 in the
  relation
  (somebody had suggested that earlier in the thread) ? Every
 committed 
  UPDATE/DELETE and aborted UPDATE/INSERT would increment
  the dead space. Whenever page fragmentation is repaired,
 either during
  normal operation or during vacuum, the dead space is reduced
 by the
  amount of reclaimed space. Autovacuum triggers whenever the
 percentage
  of dead space increases beyond a threshold.
 
  We can some fine tuning to track the space consumed by
 redirect-dead
  line pointers. 
 
 Sounds great.
 
 
 
 So do we have consensus here ? Fortunately, I think there won't be any
 changes to user interface. Users can still use the vacuum_scale_factor
 to
 tune autovacuum, but instead of percentage of dead tuples, it would 
 signify percentage of dead space in the relation. 

We have some consensus, but no complete design.

My understanding is that we would see the following things tracked in
pg_stats_xxx

n_tup_ins   count of rows inserted
n_tup_upd   count of rows updated (incl HOT and cold)
n_tup_del   count of rows deleted

- the above are required because they are already there and useful too

n_tup_hot_upd   count of rows updated by HOT method only

- the above is required to help tune HOT/cold updates

dead_space  total number of dead bytes in table

- the above is required for autovacuum

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Simon Riggs

On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:

 Let's say that this file looks just like a postgresql.conf file, except 
 that any line beginning with '[identifier]' is a config set name for 
 the lines that follow. So we might have:
 
 [asynch_commit]
 synchronous_commit = off
 
 [no_fsync]
 fsync = off
 
 [csvlogs]
 start_log_collector = true
 log_destination = 'stderr, csvlog'
 
 Then there would be an extra installcheck-parallel run for each set. If 
 the file isn't there we do nothing.

Sounds fine, though I'd prefer this in XML to allow further
extensibility in the future.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 2D partitioning of VLDB - sane or not?

2007-08-14 Thread Zeugswetter Andreas ADI SD


 Which brings us back to the original issue. If I decide to stick with 
 the current implementation and not improve our existing partitioning 
 mechanisms to scale to 100,000 partitions, I could do something like:

There is a point where you can leave the selection of the correct rows
to normal btree indexes.
I'd say that that point currently is well below 2000 partitions for all
common db systems.
I opt, that more partitions will only be useful for very limited use
cases,
and would be very interested in hearing of a practical partitioning
scheme where more partitions still show a performance advantage (any db
system).

Looks like in your case a partitioning scheme with 256 partitions on one
of the 2 dimensions sounds reasonable.
Or 32 instead of 256 bins for each dim, if your searches are
bidirectional.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] 2D partitioning of VLDB - sane or not?

2007-08-14 Thread Gregory Stark

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

 I'd say that that point currently is well below 2000 partitions for all
 common db systems.

I think it will depend heavily on the type of queries you're talking about.
Postgres's constraint_exclusion is a linear search and does quite a bit of
work for each constraint. So it isn't terribly efficient for more than 1,000
partitions or so.

*But* that only affects planning time. If your queries are always effectively
pruned to few partitions and you execute them thousands of times then you not
care about slow planning time.

And if the overall table is large enough and you're dropping and loading
partitions then you may still be benefiting from partitioning by keeping all
the loaded records together and allowing dropping a partition to be constant
time.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 But to get to the point: the urgency of testing the patch more
 extensively has just moved up a full order of magnitude, 

The problem testing this patch is that the window for a committed transaction
to not be synced is quite narrow, especially for the regression tests. For
testing purposes I wonder if there are ways we can widen this window. Some
ideas, some wackier than others, are:

. Raise the default wal_writer_delay to 5s or so -- also temporary until
  release

. Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint bits even
  when it could.

. add an ifdef USE_ASSERT_CHECKING which randomly fails to update the LSN when
  syncing WAL so that even after a buffer flush we still can't set hint bits.
  
Only the first one isn't really wacky, but perhaps there's something there.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 8.3 freeze/release

2007-08-14 Thread Michael Meskes

On Mon, Aug 13, 2007 at 10:04:32AM -0400, Tom Lane wrote:
 Well, we are in feature freeze, but as far as I'm concerned ecpg is its
 own little fiefdom.  If you have enough confidence in these changes to
 apply them now, no one is going to question you.

Okay, done. Please test it!

There were some with platforms I didn't test on, but hopefully they are
fixed now. There is one more problem with the threading tests that are
run even if threading is not activated. Will look into this later.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Question about change in page/tuple header (v4)

2007-08-14 Thread Zdenek Kotala


I'm comparing now different version of page layer, and I have two questions:

1) We now store only low 16bits TLI, but name in structure stays same. 
Maybe pg_tli_lo could be better.


2) HASOID has been moved in infomask and original place is unused. Is 
there some reason for that? This change little bit complicate tuple 
header upgrade. If there is not real reason to have it in current place, 
I recommend to move it back.



thanks Zdenek

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Alvaro Herrera

Simon Riggs wrote:
 On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:
 
  Let's say that this file looks just like a postgresql.conf file, except 
  that any line beginning with '[identifier]' is a config set name for 
  the lines that follow. So we might have:
  
  [asynch_commit]
  synchronous_commit = off
  
  [no_fsync]
  fsync = off
  
  [csvlogs]
  start_log_collector = true
  log_destination = 'stderr, csvlog'
  
  Then there would be an extra installcheck-parallel run for each set. If 
  the file isn't there we do nothing.
 
 Sounds fine, though I'd prefer this in XML to allow further
 extensibility in the future.

YAML?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Alvaro Herrera

Oleg Bartunov wrote:
 On Thu, 9 Aug 2007, [EMAIL PROTECTED] wrote:

 Maybe I'm missing something, but it seems to me that the configuration
 is more attached to a column/index thatn to the whole database. If
 there's a default in an expression, I'd rather expect this default to be
 drawn from the index involved than from a global value (like a functional
 index does now).

 I'm tired to repeat - index itself doesn't know about configuration !

Is there a way to change that?  For example store the configuration in a
metapage or something?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 Sounds fine, though I'd prefer this in XML to allow further
 extensibility in the future.

 YAML?

That would seem to require making pg_regress depend on some XML library
or other, which is a dependency I'd rather not add.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 I am trying to figure out the best way to extract this information. Is there
 any
 existing code to get all attributes used in the expressions ? Or do I need
 to walk the tree and extract that information ?

There are a number of near matches in backend/optimizer/util/var.c,
but nothing that has exactly the API you probably want, which I'd think
would be to extract a bitmapset of the varattnos of level-zero Vars.
contain_var_reference() could be used repeatedly but that seems
tremendously inefficient.  I'd suggest coding up some new function
using what's there for reference.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 What if we just track the amount of potentially dead space in the
 relation
 (somebody had suggested that earlier in the thread) ? Every committed
 UPDATE/DELETE and aborted UPDATE/INSERT would increment
 the dead space. Whenever page fragmentation is repaired, either during
 normal operation or during vacuum, the dead space is reduced by the
 amount of reclaimed space. Autovacuum triggers whenever the percentage
 of dead space increases beyond a threshold.

Doesn't this design completely fail to take index bloat into account?
Repairing heap fragmentation does not reduce the need for VACUUM to work
on the indexes.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Andrew Dunstan




Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
  

Simon Riggs wrote:


Sounds fine, though I'd prefer this in XML to allow further
extensibility in the future.
  


  

YAML?



That would seem to require making pg_regress depend on some XML library
or other, which is a dependency I'd rather not add.


  


Yeah, I think the way I set it out would work just fine for the intended 
purpose. XML, YAML, JSON et al are all well suited to tree structured 
data. But what we're describing here isn't tree structured. It is simply 
some named sets of postgresql.conf directives. As such, it would be best 
if it were as close as possible to actual postgresql.conf syntax.


And I am also reluctant to add an additional dependency onto the 
buildfarm script. (I wasn't actually thinking of doing this throught 
pg_regress, but I'm open to persuasion).


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Gregory Stark


Alvaro Herrera [EMAIL PROTECTED] writes:

 Oleg Bartunov wrote:

 I'm tired to repeat - index itself doesn't know about configuration !

 Is there a way to change that?  For example store the configuration in a
 metapage or something?

I think Heikki's suggestion of having each configuration create a new type
would effectively do the same thing.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Question about change in page/tuple header (v4)

2007-08-14 Thread Tom Lane

Zdenek Kotala [EMAIL PROTECTED] writes:
 2) HASOID has been moved in infomask and original place is unused. Is 
 there some reason for that?

To keep it next to the other bits that are about tuple content rather
than transactional behavior.

 This change little bit complicate tuple 
 header upgrade. If there is not real reason to have it in current place, 
 I recommend to move it back.

There are enough other changes there that you're going to have to
manipulate the infomask anyway.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Pavan Deolasee [EMAIL PROTECTED] writes:
 What if we just track the amount of potentially dead space in the
 relation
 (somebody had suggested that earlier in the thread) ? Every committed
 UPDATE/DELETE and aborted UPDATE/INSERT would increment
 the dead space. Whenever page fragmentation is repaired, either during
 normal operation or during vacuum, the dead space is reduced by the
 amount of reclaimed space. Autovacuum triggers whenever the percentage
 of dead space increases beyond a threshold.

 Doesn't this design completely fail to take index bloat into account?
 Repairing heap fragmentation does not reduce the need for VACUUM to work
 on the indexes.

Index bloat is a bit of an open issue already. Because page splits already
prune any LP_DELETEd pointers any busy index keys will be pruned already.

However any index keys which have not been the subject of an index lookup --
and that includes keys which are only accessed by bitmap-index-scans -- won't
be pruned.

So we don't really know how much bloat is currently in an index. Perhaps we
need a new statistic which gets updated whenever a page split prunes
LP_DELETEd pointers (or perhaps when LP_DELETE is set?).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] HOT patch, missing things

2007-08-14 Thread Simon Riggs

On Tue, 2007-08-14 at 10:10 -0400, Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  What if we just track the amount of potentially dead space in the
  relation
  (somebody had suggested that earlier in the thread) ? Every committed
  UPDATE/DELETE and aborted UPDATE/INSERT would increment
  the dead space. Whenever page fragmentation is repaired, either during
  normal operation or during vacuum, the dead space is reduced by the
  amount of reclaimed space. Autovacuum triggers whenever the percentage
  of dead space increases beyond a threshold.
 
 Doesn't this design completely fail to take index bloat into account?
 Repairing heap fragmentation does not reduce the need for VACUUM to work
 on the indexes.

I thought of that, but we will only clean up space that is allowable, so
the indexes don't degrade.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Josh Berkus

Tom, 

We're getting some additional test infrastructre at Sun; I'll throw this on 
the pile of stuff to test.

However, the current tests we're doing are regression tests and benchmark 
runs.  If there's some other kind of testing we need to do, I'll need 
specifics.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]

2007-08-14 Thread Bruce Momjian


TODO item?

---

Michael Glaesemann wrote:
-- Start of PGP signed section.
 
 On Aug 8, 2007, at 12:18 , Decibel! wrote:
 
  On Wed, Aug 08, 2007 at 12:03:34PM -0500, Michael Glaesemann wrote:
  Personally, I think expandarray is more appropriate and its
  functionality probably more generally useful, as it identifies the
  array indices as well. Note you can also rename the columns.
 
  Sure. My point is that we should have a way to convert arrays to sets
  and back in the backend.
 
 Can't really argue with you there, as I find array_accum myself.  
 (Though I'd still nit-pick that this isn't an array to set  
 conversion, but rather array to--possibly single-column--table.)
 
 Michael Glaesemann
 grzm seespotcode net
 
 
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] GUC for default heap fillfactor

2007-08-14 Thread Bruce Momjian

Decibel! wrote:
-- Start of PGP signed section.
 On Thu, Aug 09, 2007 at 09:57:48AM +0900, ITAGAKI Takahiro wrote:
   If HOT gets into 8.3, we might need a GUC to set database wide heap
   fillfactor to an appropriate value.
  
  I have no objection to do that, but we will need other default values soon,
  something like default_[heap|btree|hash|gist|gin]_fillfactor. Some of us
  might feel it is mess to add random guc variables.
 
 I think we'd want to handle indexes with a different mechanism, probably
 one that makes changes to pg_am.
 
 In any case, how important is it to do this before 8.3? We were supposed
 to release this month, after all.

Agreed.  This seems like 8.4 material once we have HOT usage in the
field.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Bruce Momjian

Andrew Dunstan wrote:
 
 
 Tom Lane wrote:
  Andrew Dunstan [EMAIL PROTECTED] writes:

  Before I wrap up the CSVlog stuff, we need to decide whether or not to 
  change the name of the redirect_stderr setting, and if so to what. The 
  reason is that with CSVlogs it will no longer apply just to stderr (we 
  will require it to be on for CSVlogs, in fact).
  
 

  I suggest redirect_logs, although it's arguably too general as it 
  doesn't apply to syslog/eventlog.
  
 
  Perhaps it should be named analogously to stats_start_collector,
  ie think of the syslogger process as a log collector.  I don't
  much like log_start_collector though --- start_log_collector
  seems far less confusing as to where the verb is.
 
 

 
 Nobody else seems to care much. I'll go with start_log_collector.

Are we trying to use log_* prefixes, e.g. log_start_collector?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]

2007-08-14 Thread Pavel Stehule

2007/8/14, Bruce Momjian [EMAIL PROTECTED]:

 TODO item?

   + If your life is a hard drive, Christ can be your backup. +

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

I am against. It's too simple do it in SQL language.

Regards
Pavel Stehule

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:
 Andrew Dunstan wrote:
  
   I suggest redirect_logs, although it's arguably too general as it 
   doesn't apply to syslog/eventlog.
  
   Perhaps it should be named analogously to stats_start_collector,
   ie think of the syslogger process as a log collector.  I don't
   much like log_start_collector though --- start_log_collector
   seems far less confusing as to where the verb is.
  
  Nobody else seems to care much. I'll go with start_log_collector.
 
 Are we trying to use log_* prefixes, e.g. log_start_collector?

That sounds like you want to log when the collector starts, which is not
the case and is confusing -- what collector is it talking about?  This
is about starting the log collector.

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep.(Robert Davidson)
   http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian wrote:
  Andrew Dunstan wrote:
   
I suggest redirect_logs, although it's arguably too general as it 
doesn't apply to syslog/eventlog.
   
Perhaps it should be named analogously to stats_start_collector,
ie think of the syslogger process as a log collector.  I don't
much like log_start_collector though --- start_log_collector
seems far less confusing as to where the verb is.
   
   Nobody else seems to care much. I'll go with start_log_collector.
  
  Are we trying to use log_* prefixes, e.g. log_start_collector?
 
 That sounds like you want to log when the collector starts, which is not
 the case and is confusing -- what collector is it talking about?  This
 is about starting the log collector.

Yea, good point.  I was just wondering because I don't see 'start' used
in anywhere at the beginning of a GUC variable.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 The problem testing this patch is that the window for a committed transaction
 to not be synced is quite narrow, especially for the regression tests. For
 testing purposes I wonder if there are ways we can widen this window. Some
 ideas, some wackier than others, are:

 . Raise the default wal_writer_delay to 5s or so -- also temporary until
   release

I ran 100+ cycles of the parallel regression tests with this setting,
and didn't see any failures.

 . Add an ifdef USE_ASSERT_CHECKING which randomly omits setting hint
   bits even when it could.

I think this is better done by code inspection, ie, look for places that
assume HEAP_XMIN/XMAX_COMMITTED is or can be set.

I made a pass over CVS HEAD and found some apparent trouble spots:
heapam.c lines 1843-1852 presume previous xmax can be hinted
immediately, ditto lines 2167-2176, ditto lines 2716-2725.
I think probably we should just remove those lines --- they are only
trying to save work for future tqual.c calls.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Simon Riggs

On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:

 I think this is better done by code inspection, ie, look for places that
 assume HEAP_XMIN/XMAX_COMMITTED is or can be set.
 
 I made a pass over CVS HEAD and found some apparent trouble spots:
 heapam.c lines 1843-1852 presume previous xmax can be hinted
 immediately, ditto lines 2167-2176, ditto lines 2716-2725.
 I think probably we should just remove those lines --- they are only
 trying to save work for future tqual.c calls.

I'll check those out later tonight.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Bruce Momjian wrote:
   Andrew Dunstan wrote:

 I suggest redirect_logs, although it's arguably too general as it 
 doesn't apply to syslog/eventlog.

 Perhaps it should be named analogously to stats_start_collector,
 ie think of the syslogger process as a log collector.  I don't
 much like log_start_collector though --- start_log_collector
 seems far less confusing as to where the verb is.

Nobody else seems to care much. I'll go with start_log_collector.
   
   Are we trying to use log_* prefixes, e.g. log_start_collector?
  
  That sounds like you want to log when the collector starts, which is not
  the case and is confusing -- what collector is it talking about?  This
  is about starting the log collector.
 
 Yea, good point.  I was just wondering because I don't see 'start' used
 in anywhere at the beginning of a GUC variable.

Good point too.  In other places we just name the feature that's to be
started, for example we don't use start_autovacuum.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Granting software the freedom to evolve guarantees only different results,
not better ones. (Zygo Blaxell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On Tue, 2007-08-14 at 12:09 -0400, Tom Lane wrote:
 heapam.c lines 1843-1852 presume previous xmax can be hinted
 immediately, ditto lines 2167-2176, ditto lines 2716-2725.
 I think probably we should just remove those lines --- they are only
 trying to save work for future tqual.c calls.

 I'll check those out later tonight.

[ looks closer ] Actually, we can't just dike out those code sections,
because the immediately following code assumes that XMAX_INVALID is
correct.  So I guess we need to export HeapTupleSetHintBits from tqual.c
and do the full pushup in these places.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Andrew Dunstan




Alvaro Herrera wrote:

That sounds like you want to log when the collector starts, which is not
the case and is confusing -- what collector is it talking about?  This
is about starting the log collector.
  

Yea, good point.  I was just wondering because I don't see 'start' used
in anywhere at the beginning of a GUC variable.



Good point too.  In other places we just name the feature that's to be
started, for example we don't use start_autovacuum.

  


How about just log_collector then?

Lets decide ASAP please - I want to get this committed.

cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Problem with locks

2007-08-14 Thread Gregory Stark


Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Seems to me this proves nothing much, since it doesn't use the same SysV
 semaphore API PG does.

 I was trying to copy the semaphore API exactly assuming
 USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we
 prefer not to use named semaphores if possible.

 What you seem to have copied is the posix_sema.c code, which AFAIK is
 only used on Darwin.  sysv_sema.c is what to look at ... unless your
 benchmark machine is a Mac.

I switched the code over to the sysv_sema style api. It's gotten a bit grotty
and I would clean it up if it weren't a temporary test program. If we find a
real problem perhaps I should add a test case like this to the smoke test in
ipc_test.c so people can check their OS. 

I did add something like the setitimer deadlock timeout to detect a process
stuck waiting. There is a race condition there if a process is woken up just
as the timer fires but if the timeout is large enough the chances of that are
pretty remote. Judging by the first thread the whole loop excluding the usleep
takes about 3ms. I've been using a timeout of 10 seconds. As such:

$ ./a.out 40 900 10
running with 40 processes for 900s with timeout of 1ms
telling threads to exit
run done
cleaning up semaphores and shared memory

#include stdlib.h
#include stdio.h
#include string.h
#include errno.h

#include unistd.h
#include signal.h
#include sys/types.h
#include sys/ipc.h
#include sys/shm.h
#include sys/sem.h
#include sys/time.h

union semun {
  int  val;/* Value for SETVAL */
  struct semid_ds *buf;/* Buffer for IPC_STAT, IPC_SET */
  unsigned short  *array;  /* Array for GETALL, SETALL */
  struct seminfo  *__buf;  /* Buffer for IPC_INFO (Linux specific) */
};

#define SEMAS_PER_SET	16
#define IPCProtection	(0600)	/* access/modify by user only */
#define PGSemaMagic		537		/* must be less than SEMVMX */
#define SEMAS_KEY_START	(5431*1000-1)

int nthreads, timeout, shmid;
volatile unsigned char *wakers;
typedef struct PGSemaphoreData
{
	int			semId;			/* semaphore set identifier */
	int			semNum;			/* semaphore number within set */
} PGSemaphoreData;
PGSemaphoreData *sems;

static void atexit_handler();
static void worker(int n);

static void down(int n);
static void up(int n);

#define MAX_THREADS 250
#define WAKER_NOOP 253
#define WAKER_EXIT 254
#define WAKER_RUNNABLE 255

/* this just forces the atexit handler to be called */
static void handle_sig(int arg) {exit(127+arg);}

int main(int argc, char *argv[])
{
  int i, semKey, runtime;
  pid_t *pids;
  struct sigaction act, oact;
  int semId=-1;

  if (argc = 1)
nthreads = 10;
  else 
nthreads = atoi(argv[1]);

  if (nthreads = 0 || nthreads  MAX_THREADS) {
fprintf(stderr, usage: nthreads not between 1 and %d\n, MAX_THREADS);
exit(1);
  }

  if (argc = 2)
runtime = 10;
  else
runtime = atoi(argv[2]);

  if (runtime  1) {
fprintf(stderr, usage: runtime shorter than 1s\n);
exit(1);
  }

  if (argc = 3)
timeout = 1000*60;
  else
timeout = 1000.0*atof(argv[3]);

  if (timeout  1) {
fprintf(stderr, usage: timeout shorter than 1s\n);
exit(1);
  }
  
  printf(running with %d processes for %ds with timeout of %dms\n, nthreads, runtime, timeout);
  sems = malloc(sizeof(*sems)*nthreads);

  semKey = SEMAS_KEY_START;
  for (i=0;inthreads;i++) {
union semun semun;
int semNum = i % SEMAS_PER_SET;

if (semNum == 0) {
  semKey += 1;
  semId = semget(semKey, SEMAS_PER_SET, IPC_CREAT | IPC_EXCL | IPCProtection);
  if (semId  0) {
	perror(semget);
	exit(1);
  }
}

semun.val = 0;
if (semctl(semId, semNum, SETVAL, semun)  0) {
  fprintf(stderr, semctl(%d, %d, SETVAL, 0): %s\n, semId, semNum, strerror(errno));
  exit(1);
}

if (semId0 || semNum  SEMAS_PER_SET)
  exit(1);

sems[i].semId = semId;
sems[i].semNum= semNum;
  }
  
  shmid = shmget(IPC_PRIVATE, nthreads*sizeof(unsigned char), IPC_CREAT | IPC_EXCL | IPCProtection);

  if (shmid == -1) {
perror(shmget);
exit(1);
  }


  wakers = shmat(shmid, NULL, 0);
  wakers[0] = WAKER_NOOP;
  for (i=1;inthreads;i++)
wakers[i] = WAKER_RUNNABLE;

  pids = malloc(sizeof(pid_t)*nthreads);

  for (i=0;inthreads;i++) {
/*printf(forking thread %d\n, i);*/
switch(pids[i] = fork()) {
case 0:
  worker(i);
  exit(0);
case -1:
  perror(fork);
  exit(1);
default:
  /*printf(successfully forked thread %d as pid %d\n, i, pids[i]);*/
  break;
}
  }

  act.sa_handler = handle_sig;
  sigemptyset(act.sa_mask);
  act.sa_flags = 0;
  if (sigaction(SIGINT, act, oact)  0)
perror(sigaction);
  atexit(atexit_handler);

  sleep(runtime);
  printf(telling threads to exit\n);

  for (i=0;inthreads;i++) {
while (wakers[i] == WAKER_RUNNABLE) {
  printf(still waiting for thread %d to block\n, i);
  sleep(1);
}

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Bruce Momjian

Andrew Dunstan wrote:
 
 
 Alvaro Herrera wrote:
  That sounds like you want to log when the collector starts, which is not
  the case and is confusing -- what collector is it talking about?  This
  is about starting the log collector.

  Yea, good point.  I was just wondering because I don't see 'start' used
  in anywhere at the beginning of a GUC variable.
  
 
  Good point too.  In other places we just name the feature that's to be
  started, for example we don't use start_autovacuum.
 

 
 How about just log_collector then?
 
 Lets decide ASAP please - I want to get this committed.

Works for me, or enable_log_collector?  Based on Alvaro's comments,
log_collector does sound like we are logging collector activity.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Kevin Grittner

 On Tue, Aug 14, 2007 at 11:42 AM, in message [EMAIL PROTECTED],
Andrew Dunstan [EMAIL PROTECTED] wrote: 
 Alvaro Herrera wrote:
 In other places we just name the feature that's to be
 started, for example we don't use start_autovacuum.
 
 How about just log_collector then?

+1
 
Unambiguous and consistent with other settings.
 
log_collector = on
 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:
 Andrew Dunstan wrote:
  
  
  Alvaro Herrera wrote:
   That sounds like you want to log when the collector starts, which is not
   the case and is confusing -- what collector is it talking about?  This
   is about starting the log collector.
 
   Yea, good point.  I was just wondering because I don't see 'start' used
   in anywhere at the beginning of a GUC variable.
  
   Good point too.  In other places we just name the feature that's to be
   started, for example we don't use start_autovacuum.
  
  How about just log_collector then?
  
  Lets decide ASAP please - I want to get this committed.
 
 Works for me, or enable_log_collector?  Based on Alvaro's comments,
 log_collector does sound like we are logging collector activity.

The problem here is that log seems to be a verb in log_collector
which is what makes it confusing.  So we need another verb to make it
clear that log is not one.  This is not a problem with autovacuum
because that one cannot be confused with a verb.

start_log_collector still gets my vote.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]

2007-08-14 Thread Joe Conway


Bruce Momjian wrote:

TODO item?



Probably. See SQL2003 UNNEST:

collection derived table ::=
  UNNEST left paren collection value expression right paren
  [ WITH ORDINALITY ]

collection value expression ::=
array value expression
  | multiset value expression

Joe

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Josh Berkus


 The problem here is that log seems to be a verb in log_collector
 which is what makes it confusing.  So we need another verb to make it
 clear that log is not one.  This is not a problem with autovacuum
 because that one cannot be confused with a verb.

 start_log_collector still gets my vote.

I vote against.  Remember that some people look at the GUCs in alpha order 
from pg_settings.  As such, if we're changing GUC names it ought to be in a 
way that groups logically if we can.

log_collector_enable or log_collector_start or even log_redirect.  But 
something with log_*

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/13/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 Heikki Linnakangas wrote:
  Bruce Momjian wrote:
   Heikki Linnakangas wrote:
   Removing the default configuration setting altogether removes the 2nd
   problem, but that's not good from a usability point of view. And it
   doesn't solve the general issue, you can still do things like:
   SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
   to_tsquery('confB', 'query');
  
   True, but in that case you are specifically naming different
   configurations, so it is hopefully obvious you have a mismatch.
 
  There's many more subtle ways to do that. For example, filling a
  tsvector column using a DEFAULT clause. But then you sometimes fill it
  in the application instead, with a different configuration. Or if one of
  the function calls is buried in another user defined function.
 
  I don't think explicitly naming the configuration gives enough protection.

 Oh, wow, OK, well in that case the text search API isn't ready and we
 will have to hold this for 8.4.


I've been watching this thread with a mixture of dread and hope,
waiting to see where the developers' inclination will end up; whether
leaving a useful foot gun available will be allowed.

This is just my $0.02 as a fairly heavy user of the current tsearch2
code, but I sincerely hope you do not cripple the system by removing
the ability to store tsvectors built using arbitrary configurations in
a single column.  Yes, it can lead to unexpected results if you do not
know what you are doing, but if you have gone beyond building a single
tsearch2 configuration then you are required to know what you are
doing.  What's more, IMO the default configuration mechanism feels
very much like a CONSTRAINT, as Oleg suggests.  That point is one of
cognizance, where if one has gone to the trouble of setting up
multiple configurations and has learned enough to do so correctly,
then one necessarily understands the importance of the setting and can
use it (or not, and use explicit configurations) correctly.  The
default config lowers the bar to an acceptable level for beginners
that have no need of multiple configurations, and while I don't feel
too strongly, personally, about having a default, I think it is both
useful and helpful for new users -- it was for me.

Now, so this email isn't entirely complaining, and as a data point for
the discussion, I'll explain why I do not want to see tsearch2
crippled in the way suggested by Heikki and Bruce.

My application (http://open-ils.org, which run 80% of the public
libraries in Georgia, USA, http://gapines.org and
http://georgialibraries.org/lib/pines.html) requires that I be able to
search a corpus of bibliographic records in a mix of languages, and
potentially with mixed stop-word rules, with one query.  I cannot know
ahead of time what languages will be used in the corpus and I cannot
restrict any one query to one language.  To accomplish this, the
record itself will be inspected inside an INSERT/UPDATE trigger to
determine the language and type, and use the correct configuration for
creating the tsvector.  This will obviously result in a mixed
tsvector column, but that's exactly what I need.  I can filter on
record language if the user happens to specify a query language (and
thus configuration), or simply rank the assumed (IP based, perhaps, or
browser preference based) preferred language higher, or one of a
hundred other things.  But I won't be able to do any of that if
tsvectors are required to have one and only one configuration per
column.

Anyway, I felt I needed to provide some outside perspective to this,
as a user, since it seems that the external viewpoint (my particular
viewpoint, at least) was missing from the discussion.

Thanks, folks, for all the work on this so far!

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 The problem here is that log seems to be a verb in log_collector
 which is what makes it confusing.  So we need another verb to make it
 clear that log is not one.  This is not a problem with autovacuum
 because that one cannot be confused with a verb.
 
 start_log_collector still gets my vote.

 log_collector_enable or log_collector_start or even log_redirect.  But 
 something with log_*

I'm voting with Alvaro on this.  All of your suggestions are confusing
because log looks like the verb, which it is not.  Specifically, they
sound like what the switch does is to cause a log message to be emitted
about some action that would occur anyway.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Heikki Linnakangas

Mike Rylander wrote:
 This is just my $0.02 as a fairly heavy user of the current tsearch2
 code, but I sincerely hope you do not cripple the system by removing
 the ability to store tsvectors built using arbitrary configurations in
 a single column.  Yes, it can lead to unexpected results if you do not
 know what you are doing, but if you have gone beyond building a single
 tsearch2 configuration then you are required to know what you are
 doing.  What's more, IMO the default configuration mechanism feels
 very much like a CONSTRAINT, as Oleg suggests.  That point is one of
 cognizance, where if one has gone to the trouble of setting up
 multiple configurations and has learned enough to do so correctly,
 then one necessarily understands the importance of the setting and can
 use it (or not, and use explicit configurations) correctly.  The
 default config lowers the bar to an acceptable level for beginners
 that have no need of multiple configurations, and while I don't feel
 too strongly, personally, about having a default, I think it is both
 useful and helpful for new users -- it was for me.

Thanks for chiming in. As a disclaimer: I've never used tsearch2 in a
real application myself.

 My application (http://open-ils.org, which run 80% of the public
 libraries in Georgia, USA, http://gapines.org and
 http://georgialibraries.org/lib/pines.html) requires that I be able to
 search a corpus of bibliographic records in a mix of languages, and
 potentially with mixed stop-word rules, with one query.  I cannot know
 ahead of time what languages will be used in the corpus and I cannot
 restrict any one query to one language.  To accomplish this, the
 record itself will be inspected inside an INSERT/UPDATE trigger to
 determine the language and type, and use the correct configuration for
 creating the tsvector.  This will obviously result in a mixed
 tsvector column, but that's exactly what I need.  I can filter on
 record language if the user happens to specify a query language (and
 thus configuration), or simply rank the assumed (IP based, perhaps, or
 browser preference based) preferred language higher, or one of a
 hundred other things.  But I won't be able to do any of that if
 tsvectors are required to have one and only one configuration per
 column.

Don't you need to use the right configuration to parse the query into a
tsquery as well?

What you have is basically the classic problem problem of representing
inheritance in a relational model. You have a superclass, bibliographic
record, and multiple subclasses, one per language with one extra field,
the corpus in the right language. You've solved it by storing them all
in one table, with an extra column (language) to denote which subclass
the record is. Alternatively, you could solve it by using inherited
tables, or having one table per language with a foreign key referencing
the master table containing the other fields, or having a single table
with one nullable field per configuration, and a check constraint to
check that exactly one of them is not null.

As a thought experiment, let me present another, not text search related
example that's isomorphic to your problem:

Imagine an international online auction system, where you can put items
for sale and specify a minimum price. Part of the database schema is:

CREATE TABLE item (
  id INTEGER,
  description TEXT,
  currency CHAR(3),
  price NUMERIC
);

It clearly doesn't make sense to directly compare prices in different
currencies against each other. A query like WHERE price  1000 doesn't
make sense, unless you also restrict the currency, or use an exchange
rate to convert between currencies. Also, different currencies might
have slightly different rules on how many decimal places are
significant. In this example, as well as your bibliographic scenario, we
can conveniently store prices in all currencies in the same field
because they're all numerics. If we wanted to enforce per-currency
rules, like NUMERIC(10, 2) for USD and NUMERIC(10,0) for Italian lires
(which doesn't really exists anymore, I know), we'd need to store them
in separate columns. And before the decimalisation in 1971, the British
pound was divided into 20 shillings, each of which was divided to 12
pence, so it wouldn't have fit to normal numeric field, and we would
again have to store that in a separate column.

What I'm trying to point out is that the problem isn't unique to text
search. It's an old problem, with many alternative solutions, even with
strong typing. Arguably if you store data in multiple languages in same
field, you have a denormalized schema. Granted, loose typing can be more
convenient, but you give up the benefits of strong typing as well.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Heikki Linnakangas

Bruce Momjian wrote:
 Heikki Linnakangas wrote:
 Bruce Momjian wrote:
 Heikki Linnakangas wrote:
 Removing the default configuration setting altogether removes the 2nd
 problem, but that's not good from a usability point of view. And it
 doesn't solve the general issue, you can still do things like:
 SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
 to_tsquery('confB', 'query');
 True, but in that case you are specifically naming different
 configurations, so it is hopefully obvious you have a mismatch.
 There's many more subtle ways to do that. For example, filling a
 tsvector column using a DEFAULT clause. But then you sometimes fill it
 in the application instead, with a different configuration. Or if one of
 the function calls is buried in another user defined function.

 I don't think explicitly naming the configuration gives enough protection.
 
 Oh, wow, OK, well in that case the text search API isn't ready and we
 will have to hold this for 8.4.

That would be unfortunate :(. Sorry I haven't looked at this earlier. Do
you think that implementing a strongly typed system is too much work for
8.3?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Testing the async-commit patch

2007-08-14 Thread Greg Smith



On Mon, 2007-08-13 at 16:27 -0400, Andrew Dunstan wrote:

[asynch_commit]
synchronous_commit = off
[no_fsync]
 fsync = off


This is the Windows INI file format.  As such, it's easy to find code 
samples in almost any language that parse this format for you.  For 
example, Python has a core library called ConfigParser that will read 
these in, and somewhere around here I even have some UNIX shell code that 
parses it.  There's already a PostgreSQL-related project using this 
format--even on UNIX systems the odbc.ini config files look like this.


I already have a program that generates multiple postgresql.conf files 
using this format around here for exactly this sort of test (compare 
benchmark results with two different configurations), just never had a 
reason to package it up for anybody else to use.  It's trivial code if you 
use the Python parser.


On Tue, 14 Aug 2007, Simon Riggs wrote:


Sounds fine, though I'd prefer this in XML to allow further
extensibility in the future.


Putting this in XML significantly raises the bar for how complicated tools 
that work on these files must be, with the implicit dependencies that go 
with that.  And as Andrew already pointed out, there is very little 
tree-structure to this data that justifies the extra complexity.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Heikki Linnakangas

Josh Berkus wrote:
 The problem here is that log seems to be a verb in log_collector
 which is what makes it confusing.  So we need another verb to make it
 clear that log is not one.  This is not a problem with autovacuum
 because that one cannot be confused with a verb.

 start_log_collector still gets my vote.
 
 I vote against.  Remember that some people look at the GUCs in alpha order 
 from pg_settings.  As such, if we're changing GUC names it ought to be in a 
 way that groups logically if we can.

Should we change the ordering of pg_settings?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Oleg Bartunov


On Tue, 14 Aug 2007, Alvaro Herrera wrote:


Oleg Bartunov wrote:

On Thu, 9 Aug 2007, [EMAIL PROTECTED] wrote:


Maybe I'm missing something, but it seems to me that the configuration
is more attached to a column/index thatn to the whole database. If
there's a default in an expression, I'd rather expect this default to be
drawn from the index involved than from a global value (like a functional
index does now).


I'm tired to repeat - index itself doesn't know about configuration !


Is there a way to change that?  For example store the configuration in a
metapage or something?


it's useless, in general, since you could use different configuration to 
build tsvector.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

A lot of work has been done to try to get /contrib/tsearch2 into the
core backend for 8.3, but we have hit a roadblock in how to handle
multiple text search configurations.  (FYI, the documentation is at
http://momjian.us/expire/textsearch/HTML/textsearch.html.)

There are three options for controlling text search configurations:

1) have a GUC variable which specifies the default configuration
2) require the configuration to be always specified
3) use the type system to automatically use the right configuration

The problem with #1 is that is it error-prone (easy to mismatch
configurations).  One idea was to have the GUC be super-user-only but
then restoring a dump as non-super-user is a problem.

The problem with #2 is that it makes implicit and explicit casting
impossible (there is no place to specify the configuration).

#3 requires more code and is probably not something we want to do at
this stage in 8.3 development.  It requires passing typmod values
between functions and operators (not something we have done easily in
the past).

Given this, should we decide to not include full text search in 8.3?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Michael Glaesemann



On Aug 14, 2007, at 12:40 , Tom Lane wrote:


Josh Berkus [EMAIL PROTECTED] writes:

The problem here is that log seems to be a verb in log_collector
which is what makes it confusing.  So we need another verb to  
make it
clear that log is not one.  This is not a problem with  
autovacuum

because that one cannot be confused with a verb.

start_log_collector still gets my vote.


log_collector_enable or log_collector_start or even log_redirect.   
But

something with log_*


I'm voting with Alvaro on this.  All of your suggestions are confusing
because log looks like the verb, which it is not.  Specifically,  
they
sound like what the switch does is to cause a log message to be  
emitted

about some action that would occur anyway.


AIUI, if the-GUC-yet-to-be-named is not enabled, no logging is done  
at all: messages are just sent to stderr. Why something simple like  
enable_logging or start_logger?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:
 A lot of work has been done to try to get /contrib/tsearch2 into the
 core backend for 8.3, but we have hit a roadblock in how to handle
 multiple text search configurations.  (FYI, the documentation is at
 http://momjian.us/expire/textsearch/HTML/textsearch.html.)
 
 There are three options for controlling text search configurations:
 
   1) have a GUC variable which specifies the default configuration
   2) require the configuration to be always specified
   3) use the type system to automatically use the right configuration
 
 The problem with #1 is that is it error-prone (easy to mismatch
 configurations).  One idea was to have the GUC be super-user-only but
 then restoring a dump as non-super-user is a problem.

What is the worst consequence of mismatching configuration?  Does it
cause a system crash?  A backend hang?  A corrupted index?  Lost data?
Or does it, as I assume, just fail to return the exact result set that
would be returned if the correct configuration was supplied?

If the answer is the latter, I think this is not so huge a problem that
FULL TEXT should be rejected for 8.3 on these grounds.  We just tell
people to use the correct query and be done with it.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
La vida es para el que se aventura

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Heikki Linnakangas

Bruce Momjian wrote:
 #3 requires more code and is probably not something we want to do at
 this stage in 8.3 development.  It requires passing typmod values
 between functions and operators (not something we have done easily in
 the past).

It does? I was thinking of implicitly creating a new type, with no
typmod, when you create a new configuration. Similar to enums, I think.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Tom Lane

Michael Glaesemann [EMAIL PROTECTED] writes:
 AIUI, if the-GUC-yet-to-be-named is not enabled, no logging is done  
 at all: messages are just sent to stderr. Why something simple like  
 enable_logging or start_logger?

Um, that's still logging by my definition.  I could live with
start_logger, since that is not the same as logging.

It could be that if we want real consistency we're going to have to make
more changes than this one.  Consider something like this:

* All variables that cause a specific kind of log message to be printed
or not shall be named log_something.  (So log_ is a verb.)

* Variables that affect the logging mechanism as a whole shall be named
logging_something.

For example, log_line_prefix is misnamed under this rule, and ought to
be logging_line_prefix.  Similarly, redirect_stderr would become
logging_something --- I'd prefer logging_start_collector but could
live with logging_collector (or maybe logging_use_collector?)

Looking at the docs, there are a whole bunch of GUCs that would have
to be renamed to meet this rule, but I think it would become clearer
what does what.

Is that too radical?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Heikki Linnakangas

Alvaro Herrera wrote:
 What is the worst consequence of mismatching configuration?  Does it
 cause a system crash?  A backend hang?  A corrupted index?  Lost data?
 Or does it, as I assume, just fail to return the exact result set that
 would be returned if the correct configuration was supplied?

Your assumption is correct.

You can mismatch configurations not just by querying in a wrong
configuration, but also by accidentally storing tsvectors generated with
different configurations in the same column (with no additional column
like Mike Rylander had to tell them apart), but it's still going to look
 OK from PostgreSQL's point of view.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Kenneth Marshall

On Tue, Aug 14, 2007 at 03:15:44PM -0400, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  A lot of work has been done to try to get /contrib/tsearch2 into the
  core backend for 8.3, but we have hit a roadblock in how to handle
  multiple text search configurations.  (FYI, the documentation is at
  http://momjian.us/expire/textsearch/HTML/textsearch.html.)
  
  There are three options for controlling text search configurations:
  
  1) have a GUC variable which specifies the default configuration
  2) require the configuration to be always specified
  3) use the type system to automatically use the right configuration
  
  The problem with #1 is that is it error-prone (easy to mismatch
  configurations).  One idea was to have the GUC be super-user-only but
  then restoring a dump as non-super-user is a problem.
 
 What is the worst consequence of mismatching configuration?  Does it
 cause a system crash?  A backend hang?  A corrupted index?  Lost data?
 Or does it, as I assume, just fail to return the exact result set that
 would be returned if the correct configuration was supplied?
 
 If the answer is the latter, I think this is not so huge a problem that
 FULL TEXT should be rejected for 8.3 on these grounds.  We just tell
 people to use the correct query and be done with it.
 
I think that users of the full-text option would need to read the
documentation and we could include any needed caveats. This would
certainly do for the 8.3 release and would give us time to simplify
the management and use in the 8.4 release. I, for one, have been
waiting a long time for it to be integrated into the database. 

Ken

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Andrew Dunstan




Heikki Linnakangas wrote:

Josh Berkus wrote:
  

The problem here is that log seems to be a verb in log_collector
which is what makes it confusing.  So we need another verb to make it
clear that log is not one.  This is not a problem with autovacuum
because that one cannot be confused with a verb.

start_log_collector still gets my vote.
  
I vote against.  Remember that some people look at the GUCs in alpha order 
from pg_settings.  As such, if we're changing GUC names it ought to be in a 
way that groups logically if we can.



Should we change the ordering of pg_settings?

  


Yeah, this is not a good reason for deciding a naming issue.

If we really want thematic collection of GUC vars then we should arrange 
for some sort of hierarchical naming, and appropriate sectioning of the 
config file.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Brendan Jurd

On 8/15/07, Tom Lane [EMAIL PROTECTED] wrote:
 For example, log_line_prefix is misnamed under this rule, and ought to
 be logging_line_prefix.  Similarly, redirect_stderr would become
 logging_something --- I'd prefer logging_start_collector but could
 live with logging_collector (or maybe logging_use_collector?)


The consistent prefix idea sounds good; does logging_enable jive
with your proposal?

Introduction of the term collector might be an overthink, and could
confuse people.  Your average postgres user tweaking his config file
is going to see that and wonder what on earth a log collector is.

Whereas it's generally understood that to log is to capture output
and make it persistent, and logging_enable is clearly a setting that
allows this to take place.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  #3 requires more code and is probably not something we want to do at
  this stage in 8.3 development.  It requires passing typmod values
  between functions and operators (not something we have done easily in
  the past).
 
 It does? I was thinking of implicitly creating a new type, with no
 typmod, when you create a new configuration. Similar to enums, I think.

So each new configuration is a new data type?  How do the tsearch
functions handle these new data types?  I also question if this can be
completed soon.  I have seen no specification yet, let alone someone
coding it.

Basically, the default GUC doesn't work because of:

error prone
if super-user only, non-super-user doesn't work on restore
if non-super-user, can cause mismatch (perhaps this is the best
  option), and restore still a problem (no storage of config in
  indexes or tables)

No one seems to like the always-specify the configuration (loses cast
ability).  And I don't see the code for new type appearing anytime soon.

Bottom line --- we better figure out something quick or it isn't going
to be in 8.3, and at this point, I am starting to doubt a solution will
magically appear.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Alvaro Herrera



With so many people trying to paint this particular bikeshed, my
suggestion to Andrew is to commit the patch as is and leave the rename
for a later patch.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
Jude: I wish humans laid eggs
Ringlord: Why would you want humans to lay eggs?
Jude: So I can eat them

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Mike Rylander wrote:
[snip]

 Don't you need to use the right configuration to parse the query into a
 tsquery as well?


Only if the user (or user agent) can supply enough information to move
away from the configured default of, say, en-US.  And even then, other
tsvector configurations need to be searched.  Configurations are not,
of course, just about language/stemming, but also stop-words and
thesauri.  If it were just languages it wouldn't be too big of a deal
(other than in-app DDL management...).

 What you have is basically the classic problem problem of representing
 inheritance in a relational model. You have a superclass, bibliographic
 record, and multiple subclasses, one per language with one extra field,
 the corpus in the right language. You've solved it by storing them all
 in one table, with an extra column (language) to denote which subclass
 the record is. Alternatively, you could solve it by using inherited
 tables, or having one table per language with a foreign key referencing
 the master table containing the other fields, or having a single table
 with one nullable field per configuration, and a check constraint to
 check that exactly one of them is not null.


Sorry, communication problem here ... I provided an oversimplified
example meant more to show the issues than to find alternate
solutions, though I appreciate you taking the time to consider the
problem.  If I only had to consider one delimiting facet per record
then it would be much simpler. But with the complication that some
fields have stop-word lists (and some not), some use thesauri (and
different ones, at that), and these fields (as extracted from the
records) and their configurations (stem? use a stop-word list? use
thesaurus x, y or z?) are user-defined ...

 As a thought experiment, let me present another, not text search related
 example that's isomorphic to your problem:


Unfortunately, the problem you describe is not quite the same as the
problem I need to solve.

Fortunately, the current incarnation of tsearch2 does a pretty good
job of solving the problem if you store everything in one place and
use the query-time environment to apply some adjustments to the
ranking of items.  I could still work around this problem by creating
inherited tables, one for each configuration on each index-providing
table but I /really/ hope to avoid that.  Creating new configurations
for local requirements doesn't require creating new tables (and the
associated management overhead in the app) today, something I'd really
like to avoid.  In fact, I'm starting to sweat just thinking about
what the planner would go through with the number tables needed for
the potential configurations in an installation that makes use of
multiple thesauri and a mix of stop-word lists across, say, 30
languages.  Such a dataset is not uncommon.

In any case, thanks again for taking the time to think about the
problem.  I still think having the ability to store any old tsvector I
happen to have hanging around in any column of the correct type is a
GoodThing(tm).  I see from Oleg's message down-thread that that's the
way things will be (the tsvector type doesn't know about columns, just
lexem tuples).

--miker

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Bruce Momjian

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  Heikki Linnakangas wrote:
  Bruce Momjian wrote:
  Heikki Linnakangas wrote:
  Removing the default configuration setting altogether removes the 2nd
  problem, but that's not good from a usability point of view. And it
  doesn't solve the general issue, you can still do things like:
  SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
  to_tsquery('confB', 'query');
  True, but in that case you are specifically naming different
  configurations, so it is hopefully obvious you have a mismatch.
  There's many more subtle ways to do that. For example, filling a
  tsvector column using a DEFAULT clause. But then you sometimes fill it
  in the application instead, with a different configuration. Or if one of
  the function calls is buried in another user defined function.
 
  I don't think explicitly naming the configuration gives enough protection.
  
  Oh, wow, OK, well in that case the text search API isn't ready and we
  will have to hold this for 8.4.
 
 That would be unfortunate :(. Sorry I haven't looked at this earlier. Do
 you think that implementing a strongly typed system is too much work for
 8.3?

Yea, probably.  See my other posting with an updated subject line.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Tom Lane

Brendan Jurd [EMAIL PROTECTED] writes:
 On 8/15/07, Tom Lane [EMAIL PROTECTED] wrote:
 For example, log_line_prefix is misnamed under this rule, and ought to
 be logging_line_prefix.  Similarly, redirect_stderr would become
 logging_something --- I'd prefer logging_start_collector but could
 live with logging_collector (or maybe logging_use_collector?)

 The consistent prefix idea sounds good; does logging_enable jive
 with your proposal?

I dislike it.  I claim that logging to plain stderr (without the
syslogger process) is still logging.  Logging to syslog (which also
doen't need the syslogger process) is *definitely* logging.  Something
named logging_enable would suggest to the normal person that without
it turned on, you'll get *nothing*.

I'm not wedded to collector per se, but you really cannot escape the
fact that there is one more concept in here than you wish to admit.
I think that reflecting the existence of a collector process in the GUC
names makes things clearer, not less clear.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Alvaro Herrera

Bruce Momjian wrote:

 Basically, the default GUC doesn't work because of:
 
   error prone
   if super-user only, non-super-user doesn't work on restore
   if non-super-user, can cause mismatch (perhaps this is the best
 option), and restore still a problem (no storage of config in
 indexes or tables)

I haven't really seen anyone else arguing about this.  I wonder whether
you are being overly zealous about it.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem.  (Tom Lane)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Bruce Momjian

Mike Rylander wrote:
 This is just my $0.02 as a fairly heavy user of the current tsearch2
 code, but I sincerely hope you do not cripple the system by removing
 the ability to store tsvectors built using arbitrary configurations in
 a single column.  Yes, it can lead to unexpected results if you do not
 know what you are doing, but if you have gone beyond building a single
 tsearch2 configuration then you are required to know what you are
 doing.  What's more, IMO the default configuration mechanism feels
 very much like a CONSTRAINT, as Oleg suggests.  That point is one of
 cognizance, where if one has gone to the trouble of setting up
 multiple configurations and has learned enough to do so correctly,
 then one necessarily understands the importance of the setting and can
 use it (or not, and use explicit configurations) correctly.  The
 default config lowers the bar to an acceptable level for beginners
 that have no need of multiple configurations, and while I don't feel
 too strongly, personally, about having a default, I think it is both
 useful and helpful for new users -- it was for me.

What has really hurt the default GUC idea is how to do restores from a
pg_dump.  How do you make sure the right default is used on a restore,
particularly if multiple objects are being restored, and each has a
different default GUC.  I suppose your trigger handles that but that
isn't going to help with an expression index, nor in cases where the
default of the old database is different from the new one.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  Basically, the default GUC doesn't work because of:
  
  error prone
  if super-user only, non-super-user doesn't work on restore
  if non-super-user, can cause mismatch (perhaps this is the best
option), and restore still a problem (no storage of config in
indexes or tables)
 
 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

Uh, OK, but no one has told me how a database restore without a
configuration name would work, so I am all ears.

CREATE INDEX ii on x(to_tsvector(col))

There is nothing that says what configuration that index should use
except the default setting, and if that is different in the restore
database, you have problems.  Same for a trigger that calls to_tsvector
and is restored into a different database.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Mike Rylander

On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Bruce Momjian wrote:

  Basically, the default GUC doesn't work because of:
 
error prone
if super-user only, non-super-user doesn't work on restore
if non-super-user, can cause mismatch (perhaps this is the best
  option), and restore still a problem (no storage of config in
  indexes or tables)

 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

I hate to just pile on Bruce, but as a production user I tend to
agree... let me shoot myself in the foot. :)

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] change name of redirect_stderr?

2007-08-14 Thread Brendan Jurd

On 8/15/07, Tom Lane [EMAIL PROTECTED] wrote:
 Brendan Jurd [EMAIL PROTECTED] writes:
  The consistent prefix idea sounds good; does logging_enable jive
  with your proposal?

 I dislike it.  I claim that logging to plain stderr (without the
 syslogger process) is still logging.  Logging to syslog (which also
 doen't need the syslogger process) is *definitely* logging.  Something
 named logging_enable would suggest to the normal person that without
 it turned on, you'll get *nothing*.

 I'm not wedded to collector per se, but you really cannot escape the
 fact that there is one more concept in here than you wish to admit.
 I think that reflecting the existence of a collector process in the GUC
 names makes things clearer, not less clear.

Fair enough.  I just took a fresh look at postmaster.conf, and indeed
the logging variables are more complex than I gave them credit for
with logging_enable.  Retracted.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]

2007-08-14 Thread Decibel!

On Tue, Aug 14, 2007 at 05:38:33PM +0200, Pavel Stehule wrote:
 2007/8/14, Bruce Momjian [EMAIL PROTECTED]:
 
  TODO item?
 
+ If your life is a hard drive, Christ can be your backup. +
 
  ---(end of broadcast)---
  TIP 6: explain analyze is your friend
 
 I am against. It's too simple do it in SQL language.

Why make everyone who works with arrays create a function just to do
this? Something that's of use to common users should be included, simple
or not.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgphs3V40sOor.pgp
Description: PGP signature

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Alvaro Herrera

Bruce Momjian escribió:
 Mike Rylander wrote:
  This is just my $0.02 as a fairly heavy user of the current tsearch2
  code, but I sincerely hope you do not cripple the system by removing
  the ability to store tsvectors built using arbitrary configurations in
  a single column.  Yes, it can lead to unexpected results if you do not
  know what you are doing, but if you have gone beyond building a single
  tsearch2 configuration then you are required to know what you are
  doing.  What's more, IMO the default configuration mechanism feels
  very much like a CONSTRAINT, as Oleg suggests.  That point is one of
  cognizance, where if one has gone to the trouble of setting up
  multiple configurations and has learned enough to do so correctly,
  then one necessarily understands the importance of the setting and can
  use it (or not, and use explicit configurations) correctly.  The
  default config lowers the bar to an acceptable level for beginners
  that have no need of multiple configurations, and while I don't feel
  too strongly, personally, about having a default, I think it is both
  useful and helpful for new users -- it was for me.
 
 What has really hurt the default GUC idea is how to do restores from a
 pg_dump.  How do you make sure the right default is used on a restore,
 particularly if multiple objects are being restored, and each has a
 different default GUC.  I suppose your trigger handles that but that
 isn't going to help with an expression index, nor in cases where the
 default of the old database is different from the new one.

I guess what should happen is that pg_dump should include a
SET default_text_search_config = 'foo'
just before the CREATE INDEX, like we do for other variables.  Of
course, in order for this to work, the index itself must know what value
was used on creation.  Oleg already dismissed my suggestion of putting
it into the index itself (a metapage or something).

Maybe store it in reloptions?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Alvaro Herrera

Oleg Bartunov wrote:
 On Tue, 14 Aug 2007, Alvaro Herrera wrote:

 Oleg Bartunov wrote:
 On Thu, 9 Aug 2007, [EMAIL PROTECTED] wrote:

 Maybe I'm missing something, but it seems to me that the configuration
 is more attached to a column/index thatn to the whole database. If
 there's a default in an expression, I'd rather expect this default to be
 drawn from the index involved than from a global value (like a 
 functional
 index does now).

 I'm tired to repeat - index itself doesn't know about configuration !

 Is there a way to change that?  For example store the configuration in a
 metapage or something?

 it's useless, in general, since you could use different configuration to 
 build tsvector.

Hmm, sorry, I think I just understood what this was about: so you mean
that the configuration is really *per row* and not per index?  So I can
store rows into an index using more than one configuration, and it will
work?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Index Tuple Compression Approach?

2007-08-14 Thread Chris Browne

I recently had a chat with someone who was pretty intimate with Adabas
for a number of years who's in the process of figuring things out
about PostgreSQL.  We poked at bits of the respective implementations,
seeing some similarities and differences.  He pointed out one aspect
of index handling that could (in principle) be an interesting
optimization.

Evidently, in Adabas, index leaf nodes were not simply tuples, but
lists where the index value would not be repeated.

In PostgreSQL, if you have the index value 'abc', and there are 10
tuples with that value, then you'll have a page full of tuples of the
following form:

|abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...

Now, the Adabas approach was rather different.  It would only have the
index value once, and then have the list of tuple pointers:

|abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|

This could allow a fair bit of compression, for cases where the index
value is not unique.

There is a concommitant downside, that concurrent updates may fight
over a page, and, since there would be a higher density, there would
be more need to fight over pages.

Does this seem pretty much like madness?  Or is it a plausible some
day ToDo?
-- 
cbbrowne,@,acm.org
http://linuxfinances.info/info/postgresql.html
I don't do drugs anymore 'cause I  find I get the same effect just by
standing up really fast. -- Jonathan Katz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Index Tuple Compression Approach?

2007-08-14 Thread Decibel!

Isn't this what Grouped Index Tuples is?

On Tue, Aug 14, 2007 at 05:21:16PM -0400, Chris Browne wrote:
 I recently had a chat with someone who was pretty intimate with Adabas
 for a number of years who's in the process of figuring things out
 about PostgreSQL.  We poked at bits of the respective implementations,
 seeing some similarities and differences.  He pointed out one aspect
 of index handling that could (in principle) be an interesting
 optimization.
 
 Evidently, in Adabas, index leaf nodes were not simply tuples, but
 lists where the index value would not be repeated.
 
 In PostgreSQL, if you have the index value 'abc', and there are 10
 tuples with that value, then you'll have a page full of tuples of the
 following form:
 
 |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
 
 Now, the Adabas approach was rather different.  It would only have the
 index value once, and then have the list of tuple pointers:
 
 |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
 
 This could allow a fair bit of compression, for cases where the index
 value is not unique.
 
 There is a concommitant downside, that concurrent updates may fight
 over a page, and, since there would be a higher density, there would
 be more need to fight over pages.
 
 Does this seem pretty much like madness?  Or is it a plausible some
 day ToDo?
 -- 
 cbbrowne,@,acm.org
 http://linuxfinances.info/info/postgresql.html
 I don't do drugs anymore 'cause I  find I get the same effect just by
 standing up really fast. -- Jonathan Katz
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgppr2SjO8mDI.pgp
Description: PGP signature

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 Oleg Bartunov wrote:
  On Tue, 14 Aug 2007, Alvaro Herrera wrote:
 
  Oleg Bartunov wrote:
  On Thu, 9 Aug 2007, [EMAIL PROTECTED] wrote:
 
  Maybe I'm missing something, but it seems to me that the configuration
  is more attached to a column/index thatn to the whole database. If
  there's a default in an expression, I'd rather expect this default to be
  drawn from the index involved than from a global value (like a
  functional
  index does now).
 
  I'm tired to repeat - index itself doesn't know about configuration !
 
  Is there a way to change that?  For example store the configuration in a
  metapage or something?
 
  it's useless, in general, since you could use different configuration to
  build tsvector.

 Hmm, sorry, I think I just understood what this was about: so you mean
 that the configuration is really *per row* and not per index?  So I can
 store rows into an index using more than one configuration, and it will
 work?

Can and does, to great success.  :)

--miker

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 I haven't really seen anyone else arguing about this.  I wonder whether
 you are being overly zealous about it.

 Uh, OK, but no one has told me how a database restore without a
 configuration name would work, so I am all ears.

It's the dump/restore problem that really is the kicker here.  I don't
mind so much a GUC that only controls the interpretation of queries, but
if it determines how an index or a tsvector column gets rebuilt, we've
got problems.

I've just finished re-reading the prior thread, and here are what seem
to me to be the salient points:

* Oleg, Teodor, and all of the old-line users of tsearch2 are
comfortable with setting up a trigger to maintain a materialized
tsvector column for a table.  They argue that recomputing the tsvector
(possibly more than once) during a query is more expensive than fetching
it from disk.  My suspicion is that the latter argument gets weaker
every year --- CPUs are getting faster lots faster than disks are.

* Bruce (and I ... not sure about anyone else) want to support usage of
text search via a functional index.  This is argued to be easier to set
up (no fooling with triggers) and possibly faster depending on CPU vs
I/O speeds.  I don't think there is any desire here to eliminate the
trigger approach, just to provide an alternative.

* For *either* the trigger or functional-index approach, I think it is
unacceptable to rely on a GUC variable to determine how the tsvector is
derived from the raw-document fields for storage or indexing.  It's just
too error-prone, particularly when you consider dump-and-reload cases.
What I think we should say is that the ts parsing configuration name
can be either hardwired or taken from another field of the table.
In the trigger case this would mean providing a couple of standard
triggers, one taking the config name as a trigger parameter, and the
other accepting a trigger parameter that's the name of the config name
column.  In the index case this would mean that the index expression
has to be either to_tsvector('constant', ...) or to_tsvector(field, ...).
Note that all four cases boil down to saying that stored or indexed
tsvectors have to be derived from the two-parameter form of to_tsvector.

* For queries, there is not anything very wrong with having a default
configuration, but the trick is how to get the planner to match that up
with an index that's written with the two-parameter form of to_tsvector.
One hackish possibility is to define the single-parameter form of
to_tsvector like this:

  create function to_tsvector(text) returns tsvector as $$
select to_tsvector(get_default_text_search_config(), $1)
  $$ language sql strict stable;

where get_default_text_search_config() is essentially just
current_setting('default_text_search_config') except it is
misleadingly marked immutable.  Then, a query with

WHERE to_tsvector(document_col) @@ tsquery(...)

will have the SQL function inlined, and the
get_default_text_search_config() call const-folded, and suddenly
it looks like

WHERE to_tsvector('english', document_col) @@ tsquery(...)

and can be matched to a functional index that's declared using the
explicit 'english' configuration name.  This is pretty grotty though
... can anyone think of a better way?  (The main objection I can see
to it is that someone could shoot himself in the foot by using this
function instead of two-parameter to_tsvector in a custom trigger
function.  But hopefully, anyone writing a custom trigger function
will have read the manual's warning not to do that.  Note that I
suggest marking the function stable so that it can't be misused
that way in a functional index.  Another possible objection is that
get_default_text_search_config() is a foot-gun all by itself, since
it could be used in a functional index.  Aside from not documenting
it I'm not sure there's much to be done about that.)

* I'm not enamored of Heikki's idea about a datatype-based solution,
because I don't think that it will allow columns containing tsvectors
derived with different configurations.  It's perfectly clear that
advanced users want to be able to do that, and it's also clear that
as long as the config name is coming from a stored column (or can be
reconstructed somehow from the stored data) that it's perfectly
well-defined.

Since I don't think that a datatype solution is the way to go,
I don't feel that we are as far away from an agreement as Bruce
is worried about.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Mike Rylander wrote:
 On 8/14/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  Bruce Momjian wrote:
 
   Basically, the default GUC doesn't work because of:
  
 error prone
 if super-user only, non-super-user doesn't work on restore
 if non-super-user, can cause mismatch (perhaps this is the best
   option), and restore still a problem (no storage of config in
   indexes or tables)
 
  I haven't really seen anyone else arguing about this.  I wonder whether
  you are being overly zealous about it.
 
 I hate to just pile on Bruce, but as a production user I tend to
 agree... let me shoot myself in the foot. :)
 
 --miker

Sure, we can document hazards, but the larger problem is related to the
fact that the default controls what gets stored in the database.  This is
a similar problem to when we had an autocommit GUC which caused
problems.

Technically, this is like how the server encoding affects what is stored
in the database.  If we allowed users to change the server encoding in a
database that already had data in it, there would be no way to identify
which data was using the old encoding and which was using the new one. 
Now, the application might be able to identify them just fine, but a
database restore would be unable to recreate the data the same way.

If we want to keep the default GUC we would have to allow non-super-user
changes so we can use it in pg_dump for restore, but even then if the
default is different there is going to be a mix of old/new after the
restore because table changes after the restore is going to use the new
default config.

Perhaps the best we could do is to tell people who change the default GUC
that they are on their own in restoring the database, or they have to be
very carful like with triggers to assign the configuration properly.  I
can imagine how complex that part of the documentation will be, but it
is doable.

The other point is that we should have a good idea of the API because
if it gets into 8.3 it will be harder to change.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 Bruce Momjian escribió:
 What has really hurt the default GUC idea is how to do restores from a
 pg_dump.

 I guess what should happen is that pg_dump should include a
 SET default_text_search_config = 'foo'
 just before the CREATE INDEX, like we do for other variables.

The nasty cases are in data-only dumps, that is, where you're trying to
load data into a table with pre-existing indexes or triggers.  A SET
like the above is at least as likely to be wrong as right, if the index
or trigger depends on it to tell it what to do.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Kris Jurka



Looking into recent buildfarm failures on the 7.4 branch:

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

It looks like parts of the CVS repository have been mistagged as belonging 
to REL7_4_STABLE or have been corrupted somehow:


http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/btree_gist/btree_bit.c?sortby=date;only_with_tag=REL7_4_STABLE

I'm not sure what's going on here, but something has gone bad.

Kris Jurka

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 The other point is that we should have a good idea of the API because
 if it gets into 8.3 it will be harder to change.

Yeah, once it's in core we have a pretty strong backwards-compatibility
restriction to deal with.  Someone upthread claimed we can always
simplify it later but that's exactly backward --- we can add features
later, but we can't subtract.

Maybe we should be looking to implement just the minimum set of features
for 8.3 and leave some of the more controversial stuff for 8.4.  I hate
to admit it, but if we take that point of view then triggers are in
and functional-index support is out.  We have to support the trigger
approach because it's what is in tsearch2 now, and the existing users
will expect to continue to have that option.

However, allowing the standard triggers to pay attention to a
configuration GUC variable is simply broken; that bit has to go away.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

Kris Jurka [EMAIL PROTECTED] writes:
 It looks like parts of the CVS repository have been mistagged as belonging 
 to REL7_4_STABLE or have been corrupted somehow:
 http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/btree_gist/btree_bit.c?sortby=date;only_with_tag=REL7_4_STABLE

Hmm ... btree_bit.c shouldn't be in 7.4 at all.

I can't help thinking this has something to do with the recent CVS
server move.  Magnus, any thoughts?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Gregory Stark

Mike Rylander [EMAIL PROTECTED] writes:

 My application (http://open-ils.org, which run 80% of the public
 libraries in Georgia, USA, http://gapines.org and
 http://georgialibraries.org/lib/pines.html) requires that I be able to
 search a corpus of bibliographic records in a mix of languages, and
 potentially with mixed stop-word rules, with one query.  I cannot know
 ahead of time what languages will be used in the corpus and I cannot
 restrict any one query to one language.  To accomplish this, the
 record itself will be inspected inside an INSERT/UPDATE trigger to
 determine the language and type, and use the correct configuration for
 creating the tsvector.  This will obviously result in a mixed
 tsvector column, but that's exactly what I need.  I can filter on
 record language if the user happens to specify a query language (and
 thus configuration), or simply rank the assumed (IP based, perhaps, or
 browser preference based) preferred language higher, or one of a
 hundred other things.  But I won't be able to do any of that if
 tsvectors are required to have one and only one configuration per
 column.

 Anyway, I felt I needed to provide some outside perspective to this,
 as a user, since it seems that the external viewpoint (my particular
 viewpoint, at least) was missing from the discussion.

This is *extremely* useful. I think it's precisely what we've been missing so
far. At least, what I've been missing.

So the question is what exactly happens in this case? If I search for the
does that mean it will ignore matches in English where that's a stop-word but
find me books on tea in French? Is that what I should expect to happen? What
if I search for earl and the? Does that find me French books on Early Grey
Tea but English books on all earls?

What happens if I use the same operator directly on the text column? Or
perhaps it's not even possible to specify stop-words when operating on a text
column? Should it be?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

I wrote:
 Kris Jurka [EMAIL PROTECTED] writes:
 It looks like parts of the CVS repository have been mistagged as belonging 
 to REL7_4_STABLE or have been corrupted somehow:
 http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/btree_gist/btree_bit.c?sortby=date;only_with_tag=REL7_4_STABLE

 Hmm ... btree_bit.c shouldn't be in 7.4 at all.

I did a fresh checkout of the 7.4 branch and diff'd against my local
copy, and it seems clear that every file that was not in 7.4 at all has
had its HEAD version tagged as REL7_4_STABLE.  The files that did exist
then are all right.  That's throughout the whole tree, not just in
contrib/btree_gist.  I have no idea how you make CVS do that, but I'm
sure there is some magic one-liner for it.

Checking the files in contrib/btree_gist on the CVS server gives a
pretty good fix on who changed it and when:

 ls -la
total 466
drwxrwxr-x   6 scrappy  dev   1024 Aug 14 22:30 .
drwxrwxr-x  90 scrappy  dev   2048 Aug 14 22:27 ..
drwxrwxr-x   2 scrappy  dev512 Apr 20 15:16 Attic
-r--r--r--   1 tgl  dev   9555 Jun 26 22:05 Makefile,v
-r--r--r--   1 258  dev   5870 Apr 20 16:19 README.btree_gist,v
-r--r--r--   1 meskes   dev  12052 Aug 12 09:50 btree_bit.c,v
-r--r--r--   1 meskes   dev   9921 Aug 12 09:50 btree_bytea.c,v
-r--r--r--   1 meskes   dev  10246 Aug 12 09:50 btree_cash.c,v
-r--r--r--   1 meskes   dev  11433 Aug 12 09:50 btree_date.c,v
-r--r--r--   1 meskes   dev  10230 Aug 12 09:50 btree_float4.c,v
-r--r--r--   1 meskes   dev  10093 Aug 12 09:50 btree_float8.c,v
-r--r--r--   1 meskes   dev  33935 Aug 12 09:50 btree_gist.c,v
-r--r--r--   1 258  dev   4325 Apr 20 15:16 btree_gist.h,v
-r--r--r--   1 258  dev  58744 Apr 20 16:19 btree_gist.sql.in,v
-r--r--r--   1 meskes   dev  14736 Aug 12 09:50 btree_inet.c,v
-r--r--r--   1 meskes   dev  10253 Aug 12 09:50 btree_int2.c,v
-r--r--r--   1 meskes   dev  10240 Aug 12 09:50 btree_int4.c,v
-r--r--r--   1 meskes   dev  10254 Aug 12 09:50 btree_int8.c,v
-r--r--r--   1 meskes   dev  18111 Aug 12 09:50 btree_interval.c,v
-r--r--r--   1 meskes   dev  12025 Aug 12 09:50 btree_macaddr.c,v
-r--r--r--   1 meskes   dev  14671 Aug 12 09:50 btree_numeric.c,v
-r--r--r--   1 meskes   dev   9796 Aug 12 09:50 btree_oid.c,v
-r--r--r--   1 meskes   dev  18247 Aug 12 09:50 btree_text.c,v
-r--r--r--   1 meskes   dev  26180 Aug 12 09:50 btree_time.c,v
-r--r--r--   1 258  dev  29712 Apr 20 15:16 btree_ts.c,v
-r--r--r--   1 meskes   dev  17588 Aug 12 09:50 btree_utils_num.c,v
-r--r--r--   1 meskes   dev   8959 Aug 12 09:50 btree_utils_num.h,v
-r--r--r--   1 meskes   dev  48824 Aug 12 09:50 btree_utils_var.c,v
-r--r--r--   1 meskes   dev   7099 Aug 12 09:50 btree_utils_var.h,v
drwxrwxr-x   3 scrappy  dev   1024 Aug 14 22:27 data
drwxrwxr-x   3 scrappy  dev   1024 Aug 14 22:27 expected
drwxrwxr-x   3 scrappy  dev   1024 Aug 14 22:27 sql
-r--r--r--   1 meskes   dev  10021 Aug 12 09:50 uninstall_btree_gist.sql,v

Michael, want to fess up to what you did?

In the meantime, though, it's not quite clear why this would lead to
a buildfarm failure --- it should just mean a lot of extraneous files
appearing in a fresh checkout.  (Looks a bit harder ... Oh, it looks
like btree_gist has some files that used to be autogenerated and are
now in CVS, so the bogusly new versions from CVS are suppressing the
desired generation from the old btree_num.c file.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Andrew Dunstan




Tom Lane wrote:

In the meantime, though, it's not quite clear why this would lead to
a buildfarm failure --- it should just mean a lot of extraneous files
appearing in a fresh checkout.  (Looks a bit harder ... Oh, it looks
like btree_gist has some files that used to be autogenerated and are
now in CVS, so the bogusly new versions from CVS are suppressing the
desired generation from the old btree_num.c file.)


  


Looking at the Committers mail, it looks like there have only been two 
very small commits since Michael's series of commits around 12 to 13.5 
hours ago, and before that nothing since around 28 hours ago. Do we have 
a backup snapshot of the repo taken during that time that we can roll 
back to? That might be simpler than doing surgery on the repo.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Index Tuple Compression Approach?

2007-08-14 Thread Jeff Davis

On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
 Isn't this what Grouped Index Tuples is?
 

http://community.enterprisedb.com/git/git-readme.txt

It looks like GIT is a little different. 

GIT actually stores a lower-bound key of a contiguous* range of keys
that all point to the same page, and for each of those ranges stores a
bitmap of page offsets. A search searches first for an exact match in
the index, and failing that, looks to see if the previous index tuple
happens to be one of these ranges.

The algorithm Chris is talking about stores a set of tuple ids (which
include page and offset) for each distinct key.

Both could be helpful, although I don't think they can work together
very well.

GIT has the disadvantage that it's lossy. It doesn't even store every
key in the index, so it can't be sure that the match actually is a
match. Thus, it returns candidate matches. That also has implications
for enforcing UNIQUE (although it's not impossible, according to the
readme). However, GIT can be used effectively on an index that happens
to be unique. GIT also assumes a tree structure, and makes no sense for
a hash index, and makes no sense for a types without ordering. GIT's
space savings is dependent on the clustering of the table.

Chris's suggestion would work on a UNIQUE index, but would be no help at
all, because there would be no duplicate keys to collapse. However, it
could be used for non-tree indexes. The space savings for this strategy
is dependent on how repetitive the keys are.

I guess the ultimate deciding factor is which can save you more space.
If you have lots of duplicates, Chris's suggestion might work better,
because you don't have to try to maintain cluster order. If you have a
wider distribution of data, GIT is probably better, although you have to
keep some degree of clustering (HOT may help with that).

Heikki is the authority on GIT, so I'm including him in the CC so he can
correct me :)

Regards,
Jeff Davis

*: I'm not 100% sure I'm using contiguous correctly, but the range of
keys can contain gaps or duplicates, so long as every key in the range
points to that same page. That is, if keys 1,1,2,3,5 all point to page
P, they can be grouped into just 1 so long as there doesn't exist a
key 4 that points to a page other than P.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Looking at the Committers mail, it looks like there have only been two 
 very small commits since Michael's series of commits around 12 to 13.5 
 hours ago, and before that nothing since around 28 hours ago. Do we have 
 a backup snapshot of the repo taken during that time that we can roll 
 back to? That might be simpler than doing surgery on the repo.

It looks to me like the mistake was unrelated to any commit, and was
made at Aug 12 09:50 UTC.  (Aside from the file-timestamp evidence,
caracara's build failure at 2007-08-12 213845 shows that the problem
is more than 2 days old.)  I count seven commits of my own and several
of Michael's since then.  If I'm right that a simple cvs rtag will fix
it, I'd rather do that than try to reapply patches.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

I wrote:
 I did a fresh checkout of the 7.4 branch and diff'd against my local
 copy, and it seems clear that every file that was not in 7.4 at all has
 had its HEAD version tagged as REL7_4_STABLE.  The files that did exist
 then are all right.  That's throughout the whole tree, not just in
 contrib/btree_gist.

As for getting out of it: it looks like the misapplied tags are revision
tags not branch tags, eg

RCS file: /cvsroot/pgsql/contrib/btree_gist/btree_bit.c,v
Working file: btree_bit.c
head: 1.7
branch:
locks: strict
access list:
symbolic names:
REL7_4_STABLE: 1.7  --- wrong
REL8_0_13: 1.2
REL8_1_9: 1.6
REL8_2_4: 1.6
REL8_0_12: 1.2
REL8_1_8: 1.6
REL8_2_3: 1.6
REL8_0_11: 1.2
REL8_1_7: 1.6
REL8_2_2: 1.6
REL8_0_10: 1.2
REL8_1_6: 1.6
REL8_2_1: 1.6
REL8_2_STABLE: 1.6.0.4

which I think means that there's a one-liner way out of it.  The CVS
manual quoth:

   To delete a tag, specify the `-d' option to either `cvs tag' or `cvs
  rtag'.  For example:

 cvs rtag -d rel-0-4 tc

  deletes the non-branch tag `rel-0-4' from the module `tc'.  In the
  event that branch tags are encountered within the repository with the
  given name, a warning message will be issued and the branch tag will
  not be deleted.  If you are absolutely certain you know what you are
  doing, the `-B' option may be specified to allow deletion of branch
  tags.  In that case, any non-branch tags encountered will trigger
  warnings and will not be deleted.

So I think that cvs rtag -d REL7_4_STABLE pgsql will fix it.
I'd like someone to double-check that though.  Also maybe we should back
up the repository first?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Andrew Dunstan




Tom Lane wrote:


So I think that cvs rtag -d REL7_4_STABLE pgsql will fix it.
I'd like someone to double-check that though. 


I will test on a copy of my mirror.


 Also maybe we should back
up the repository first?
  



Amen.

cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Andrew Dunstan




Andrew Dunstan wrote:



Tom Lane wrote:


So I think that cvs rtag -d REL7_4_STABLE pgsql will fix it.
I'd like someone to double-check that though. 


I will test on a copy of my mirror.



I copied the mirror, did a checkout from it, ran the command above in 
the checked out version, then remove my checkout and checked out 
REL7_4_STABLE and did a diff against my copy from the postgres repo that 
hasn't been updated for some time. There were no significant 
differences, and no extra files.


So I'd say your strategy looks good - backup and remove the  phony tag. 
I'd also say we should probably be logging tag commands in taginfo. 
Presumably we mere mortal committers should not be doing any tagging 
whatsoever, and tags should only be applied in preparation for releases. 
Anyway, if we actually logged it we'd catch anything untoward early 
rather than possibly days later.


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] HOT pgbench results

2007-08-14 Thread Merlin Moncure

On 8/14/07, ITAGAKI Takahiro [EMAIL PROTECTED] wrote:
 Heikki Linnakangas [EMAIL PROTECTED] wrote:

  I ran some CPU intensive pgbench tests on HOT. Results are not
  surprising, HOT makes practically no difference on the total transaction
  rate, but reduces the need to vacuum:
 
  unpatched HOT
  tps 3680  3790
  WAL written(MB) 5386  4804
  checkpoints 109
  autovacuums 116   43
  autoanalyzes139   60

 I also ran pgbench with/without HOT using a bit different configurations
 (pgbench -s10 -c10 -t50). There were 10% performance win on HOT,
 although the test was CPU intensive and with FILLFACTOR=100%.

I'm curious why I am seeing results so different from everybody else
(I had almost double tps with HOT).  Are you running fsync on/off?
Any other changes to postgresql.conf?

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [EMAIL PROTECTED]: Re: [GENERAL] array_to_set functions]

2007-08-14 Thread Merlin Moncure

On 8/14/07, Bruce Momjian [EMAIL PROTECTED] wrote:

 TODO item?

I would say yes...array_accum is virtually an essential function when
working with arrays and the suggested array_to_set (and it's built in
cousin, _pg_expand_array) really should not be built around
generate_series when a C function is faster and will scale much
better.

array_to_set, as suggested in SQL, is something only a relative expert
with PostgreSQL could be expected to write.

Thus could generate_series be relieved from providing the only core
function for set returning functions in the documentation.  IMO, this
part of the documentation could use some expansion anyways :)

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Bruce Momjian escribi?:
  What has really hurt the default GUC idea is how to do restores from a
  pg_dump.
 
  I guess what should happen is that pg_dump should include a
  SET default_text_search_config = 'foo'
  just before the CREATE INDEX, like we do for other variables.
 
 The nasty cases are in data-only dumps, that is, where you're trying to
 load data into a table with pre-existing indexes or triggers.  A SET
 like the above is at least as likely to be wrong as right, if the index
 or trigger depends on it to tell it what to do.

Ouch.  I had not even thought that far.

FYI, yes, the default tsearch GUC controls operations per row _if_ you
have triggers or expression indexes that rely on the default
configuration GUC.  If you have specified the configuration, there is no
problem, and hence my conclusion that the default GUC is too
error-prone.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 So I'd say your strategy looks good - backup and remove the  phony tag. 
 I'd also say we should probably be logging tag commands in taginfo. 
 Presumably we mere mortal committers should not be doing any tagging 
 whatsoever, and tags should only be applied in preparation for releases. 

+1 ... we should at least log such commands, and maybe disallow to anyone
except Marc's pgsql account.  Particularly since they don't get
reported in pgsql-committers messages (or is that something we
can/should change?)

Meanwhile, is there anyone around who can either (1) tar up the
repository directory tree as root, or (2) confirm that a tarball
made by a non-root committer is sufficient?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Marc G. Fournier

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, August 14, 2007 22:20:16 -0400 Tom Lane [EMAIL PROTECTED]
wrote:

 +1 ... we should at least log such commands, and maybe disallow to anyone
 except Marc's pgsql account.  Particularly since they don't get
 reported in pgsql-committers messages (or is that something we
 can/should change?)

 Meanwhile, is there anyone around who can either (1) tar up the
 repository directory tree as root, or (2) confirm that a tarball
 made by a non-root committer is sufficient?

I've just finished tar'ng up the cvs repository as root ... I believe one as a 
non-root would be sufficient also, since it *should* retain file ownership / 
permissions with the -p option ...

cvs# ls -lt cvsroot.aug14_2007.tar.bz2
- -rw-r--r--  1 root  wheel  62555651 Aug 15 02:52 cvsroot.aug14_2007.tar.bz2

cvs# du -sk cvsroot
414406  cvsroot



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGwmq14QvfyHIvDvMRAsyFAJ9AND3JTDmOMR48Hl3Fdse2lVCjSwCglsu1
Pxj1LPPPx311rpY2XFj+Kvo=
=Gz6E
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

Marc G. Fournier [EMAIL PROTECTED] writes:
 On Tuesday, August 14, 2007 22:20:16 -0400 Tom Lane [EMAIL PROTECTED]
 wrote:
 Meanwhile, is there anyone around who can either (1) tar up the
 repository directory tree as root, or (2) confirm that a tarball
 made by a non-root committer is sufficient?

 I've just finished tar'ng up the cvs repository as root ...

Great --- launching cvs rtag command now.

 I believe one as a 
 non-root would be sufficient also, since it *should* retain file ownership / 
 permissions with the -p option ...

That's what I thought too, but a little extra paranoia doesn't hurt.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 * For queries, there is not anything very wrong with having a default
 configuration, but the trick is how to get the planner to match that up
 with an index that's written with the two-parameter form of to_tsvector.
 One hackish possibility is to define the single-parameter form of
 to_tsvector like this:
 
   create function to_tsvector(text) returns tsvector as $$
 select to_tsvector(get_default_text_search_config(), $1)
   $$ language sql strict stable;
 
 where get_default_text_search_config() is essentially just
 current_setting('default_text_search_config') except it is
 misleadingly marked immutable.  Then, a query with
 
   WHERE to_tsvector(document_col) @@ tsquery(...)
 
 will have the SQL function inlined, and the
 get_default_text_search_config() call const-folded, and suddenly
 it looks like
 
   WHERE to_tsvector('english', document_col) @@ tsquery(...)

This is an interesting idea that would allow queries without a
configuration to match an expression index.  The only trick is to train
users not to use such tricks on triggers or expression index, but only
in queries.  The idea of 'stable' helps to fix that for expression
indexes, but not for trigger use, I assume.

 * I'm not enamored of Heikki's idea about a datatype-based solution,
 because I don't think that it will allow columns containing tsvectors
 derived with different configurations.  It's perfectly clear that
 advanced users want to be able to do that, and it's also clear that
 as long as the config name is coming from a stored column (or can be
 reconstructed somehow from the stored data) that it's perfectly
 well-defined.
 
 Since I don't think that a datatype solution is the way to go,
 I don't feel that we are as far away from an agreement as Bruce
 is worried about.

Well, from where I sit, there is one person saying give me the foot gun,
and Heikki saying he wants a bullet-proof type system, and you and I are
in the middle, so the big problem is I don't see a concensus forming,
and we have been discussing this for a while.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Since I don't think that a datatype solution is the way to go,
 I don't feel that we are as far away from an agreement as Bruce
 is worried about.

 Well, from where I sit, there is one person saying give me the foot gun,
 and Heikki saying he wants a bullet-proof type system, and you and I are
 in the middle, so the big problem is I don't see a concensus forming,
 and we have been discussing this for a while.

The people who actually use tsearch2 seem to all have the same opinion ...
so I think we can't go too far in the bullet-proofing direction.

But I would like a design that is bulletproof in dump/reload scenarios,
and I think it's fair to question that aspect of the tsearch2 design
because we've seen many reports of people having trouble updating
databases that use tsearch2.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  The other point is that we should have a good idea of the API because
  if it gets into 8.3 it will be harder to change.
 
 Yeah, once it's in core we have a pretty strong backwards-compatibility
 restriction to deal with.  Someone upthread claimed we can always
 simplify it later but that's exactly backward --- we can add features
 later, but we can't subtract.
 
 Maybe we should be looking to implement just the minimum set of features
 for 8.3 and leave some of the more controversial stuff for 8.4.  I hate
 to admit it, but if we take that point of view then triggers are in
 and functional-index support is out.  We have to support the trigger
 approach because it's what is in tsearch2 now, and the existing users
 will expect to continue to have that option.

Triggers and expression indexes were both in the documentation Oleg
supplied, so I am sure both are being used.  I bet some users don't even
know they are using expression indexes because creating a GIN index on a
column automatically casts to tsvector.  (But GIST does not.)  I had to
ask Oleg to find out this out.

 However, allowing the standard triggers to pay attention to a
 configuration GUC variable is simply broken; that bit has to go away.

The only trigger example supplied by Oleg and Teodor is
tsvector_update_trigger(), and that doesn't take a configuration name,
meaning it uses the default GUC configuration.

Uh, how are we going to prevent the auto-casting to tsvector from using
the default GUC config, e.g. CREATE INDEX i ON x USING GIN(col)?

This is where I started to see the need for education and error-prone
nature of the default GUC just wasn't worth having it, though I know
others disagree.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Since I don't think that a datatype solution is the way to go,
  I don't feel that we are as far away from an agreement as Bruce
  is worried about.
 
  Well, from where I sit, there is one person saying give me the foot gun,
  and Heikki saying he wants a bullet-proof type system, and you and I are
  in the middle, so the big problem is I don't see a concensus forming,
  and we have been discussing this for a while.
 
 The people who actually use tsearch2 seem to all have the same opinion ...
 so I think we can't go too far in the bullet-proofing direction.
 
 But I would like a design that is bulletproof in dump/reload scenarios,
 and I think it's fair to question that aspect of the tsearch2 design
 because we've seen many reports of people having trouble updating
 databases that use tsearch2.

Yea, look at the trouble we are having trying to underestand it all.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Tom Lane

I wrote:
 Great --- launching cvs rtag command now.

Done, and I got a plausible-looking mix of messages like
cvs rtag: Not removing branch tag `REL7_4_STABLE' from 
`/cvsroot/pgsql/src/tutorial/funcs_new.c,v'.
and a fresh checkout of REL7_4_STABLE now matches what I had locally.
So I think we walked away from that one.

TODO items:

* restrict, or at least log, cvs tag/rtag commands.  Maybe report them
to pgsql-committers.

* I notice that the repository contains some files owned by
no-longer-existent accounts, eg 258 in contrib/btree_gist:

 ls -la
total 460
drwxrwxr-x   6 scrappy  dev   1024 Aug 15 03:00 .
drwxrwxr-x  90 scrappy  dev   2048 Aug 15 03:00 ..
drwxrwxr-x   2 scrappy  dev512 Apr 20 15:16 Attic
-r--r--r--   1 tgl  dev   9555 Jun 26 22:05 Makefile,v
-r--r--r--   1 258  dev   5870 Apr 20 16:19 README.btree_gist,v
-r--r--r--   1 tgl  dev  12033 Aug 15 03:00 btree_bit.c,v
-r--r--r--   1 tgl  dev   9902 Aug 15 03:00 btree_bytea.c,v
-r--r--r--   1 tgl  dev  10227 Aug 15 03:00 btree_cash.c,v
-r--r--r--   1 tgl  dev  11414 Aug 15 03:00 btree_date.c,v
-r--r--r--   1 tgl  dev  10211 Aug 15 03:00 btree_float4.c,v
-r--r--r--   1 tgl  dev  10074 Aug 15 03:00 btree_float8.c,v
-r--r--r--   1 tgl  dev  33915 Aug 15 03:00 btree_gist.c,v
-r--r--r--   1 258  dev   4325 Apr 20 15:16 btree_gist.h,v
-r--r--r--   1 258  dev  58744 Apr 20 16:19 btree_gist.sql.in,v
-r--r--r--   1 tgl  dev  14717 Aug 15 03:00 btree_inet.c,v
-r--r--r--   1 tgl  dev  10234 Aug 15 03:00 btree_int2.c,v
-r--r--r--   1 tgl  dev  10221 Aug 15 03:00 btree_int4.c,v
-r--r--r--   1 tgl  dev  10235 Aug 15 03:00 btree_int8.c,v

I suppose this is due to Magnus' cleanup of unused committers accounts.
That was a good thing, but for security's sake these files ought to be
chown'd to some existing committer's account.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-14 Thread Andrew Dunstan




Bruce Momjian wrote:


The people who actually use tsearch2 seem to all have the same opinion ...
so I think we can't go too far in the bullet-proofing direction.



Yeah.

But I would like a design that is bulletproof in dump/reload scenarios,
and I think it's fair to question that aspect of the tsearch2 design
because we've seen many reports of people having trouble updating
databases that use tsearch2.



Yea, look at the trouble we are having trying to underestand it all.


True. But I wasn't too concerned about the forecast difficulties with 
data only dumps. Those fail in plenty of circumstances. It is important 
that there is *some* reliable dump/restore/upgrade path, though.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] default_text_search_config and expression indexes

2007-08-14 Thread Mike Rylander

On 8/14/07, Gregory Stark [EMAIL PROTECTED] wrote:
 Mike Rylander [EMAIL PROTECTED] writes:

  My application (http://open-ils.org, which run 80% of the public
  libraries in Georgia, USA, http://gapines.org and
  http://georgialibraries.org/lib/pines.html) requires that I be able to
  search a corpus of bibliographic records in a mix of languages, and
  potentially with mixed stop-word rules, with one query.  I cannot know
  ahead of time what languages will be used in the corpus and I cannot
  restrict any one query to one language.  To accomplish this, the
  record itself will be inspected inside an INSERT/UPDATE trigger to
  determine the language and type, and use the correct configuration for
  creating the tsvector.  This will obviously result in a mixed
  tsvector column, but that's exactly what I need.  I can filter on
  record language if the user happens to specify a query language (and
  thus configuration), or simply rank the assumed (IP based, perhaps, or
  browser preference based) preferred language higher, or one of a
  hundred other things.  But I won't be able to do any of that if
  tsvectors are required to have one and only one configuration per
  column.
 
  Anyway, I felt I needed to provide some outside perspective to this,
  as a user, since it seems that the external viewpoint (my particular
  viewpoint, at least) was missing from the discussion.

 This is *extremely* useful. I think it's precisely what we've been missing so
 far. At least, what I've been missing.

 So the question is what exactly happens in this case? If I search for the
 does that mean it will ignore matches in English where that's a stop-word but
 find me books on tea in French? Is that what I should expect to happen? What
 if I search for earl and the? Does that find me French books on Early Grey
 Tea but English books on all earls?

Oh dear ... you went and got me started...

Each field type has a different set of configurations for potenial
use.  Title and subject fields, for instance, do not (generally) use
stop-word lists, so a subject search for the will match any record
with the lexem the in a subject field.  Title fields are a little
more complicated, because there is information in the bibliographic
record about how and when to skip leading articles, but generally
those are indexed as well for FTS purposes.  English LCSH subjects
generally don't have stop(like) words in them, so you'll probably just
get French tea records.  Title results would be a mix of earls and
French tea records (probably) and the correlation between the user's
preferred language (either chosen query lang or UI lang, whichever is
available) will help adjust the ranking, pushing what are likely to be
the more appropriate records to the top.

Note, however, that much of this multi-tsearch2-configuration setup is
not used in the implementation at http://gapines.org/ because, well,
there's not much need (read: demand from librarians) for that dataset
to support these more complex tricks.  It's basically all en-US and
exclude stop-words.  Other implementations are making more use of what
I describe above, including a (government mandated) French-English
bilingual institution who shall remain nameless for the time being...


 What happens if I use the same operator directly on the text column? Or
 perhaps it's not even possible to specify stop-words when operating on a text
 column? Should it be?

You mean with an expression index on a text column?  I haven't
considered using them for FTS.  It just feels easier and more flexible
to me to use an external tsvector column because of the fairly heavy
processing that goes into creating each tsvector value.  I may
re-evaluate that position now that CREATE INDEX CONCURRENTLY exists,
but I'm not developing with 8.2+ only features yet.  Once 8.3 is out
that may change.

Also, unless I misunderstand, you have to wrap the text column in the
function used to build the index.  For my purposes, that makes
building a generic FTS driver for my app (which, admittedly, only has
a Postgresql driver ;) ) more difficult than having a hidden extra
column.  Again, that could change if the benefits of CREATE INDEX
CONCURRENTLY end up outweighing simpler FTS driver code.

--miker

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Marc G. Fournier

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, August 14, 2007 23:26:03 -0400 Tom Lane [EMAIL PROTECTED]
wrote:

 I wrote:
 Great --- launching cvs rtag command now.

 Done, and I got a plausible-looking mix of messages like
 cvs rtag: Not removing branch tag `REL7_4_STABLE' from
 `/cvsroot/pgsql/src/tutorial/funcs_new.c,v'. and a fresh checkout of
 REL7_4_STABLE now matches what I had locally. So I think we walked away from
 that one.

 TODO items:

 * restrict, or at least log, cvs tag/rtag commands.  Maybe report them
 to pgsql-committers.

working on it ... have code for it, I've just spent so much time with CVSROOT = 
/cvsroot, I have to relearn remote CVS :)

 ls -la
 total 460
 drwxrwxr-x   6 scrappy  dev   1024 Aug 15 03:00 .
 drwxrwxr-x  90 scrappy  dev   2048 Aug 15 03:00 ..
 drwxrwxr-x   2 scrappy  dev512 Apr 20 15:16 Attic
 -r--r--r--   1 tgl  dev   9555 Jun 26 22:05 Makefile,v
 -r--r--r--   1 258  dev   5870 Apr 20 16:19 README.btree_gist,v
 -r--r--r--   1 tgl  dev  12033 Aug 15 03:00 btree_bit.c,v
 -r--r--r--   1 tgl  dev   9902 Aug 15 03:00 btree_bytea.c,v
 -r--r--r--   1 tgl  dev  10227 Aug 15 03:00 btree_cash.c,v
 -r--r--r--   1 tgl  dev  11414 Aug 15 03:00 btree_date.c,v
 -r--r--r--   1 tgl  dev  10211 Aug 15 03:00 btree_float4.c,v
 -r--r--r--   1 tgl  dev  10074 Aug 15 03:00 btree_float8.c,v
 -r--r--r--   1 tgl  dev  33915 Aug 15 03:00 btree_gist.c,v
 -r--r--r--   1 258  dev   4325 Apr 20 15:16 btree_gist.h,v
 -r--r--r--   1 258  dev  58744 Apr 20 16:19 btree_gist.sql.in,v
 -r--r--r--   1 tgl  dev  14717 Aug 15 03:00 btree_inet.c,v
 -r--r--r--   1 tgl  dev  10234 Aug 15 03:00 btree_int2.c,v
 -r--r--r--   1 tgl  dev  10221 Aug 15 03:00 btree_int4.c,v
 -r--r--r--   1 tgl  dev  10235 Aug 15 03:00 btree_int8.c,v

 I suppose this is due to Magnus' cleanup of unused committers accounts.
 That was a good thing, but for security's sake these files ought to be
 chown'd to some existing committer's account.

I can do a quick chown -R scrappy on the whole repository ... ok?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGwnf34QvfyHIvDvMRAnGTAJ43gELVZ5Q5SpV/jppR9rODB/ILaACeKCEN
DE16fnY/PmKlrvdHFaJeIKI=
=VCFu
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CVS corruption/mistagging?

2007-08-14 Thread Marc G. Fournier

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, August 14, 2007 23:26:03 -0400 Tom Lane [EMAIL PROTECTED]
wrote:

 * restrict, or at least log, cvs tag/rtag commands.  Maybe report them
 to pgsql-committers.

It should be done ... if you try and create a tag, it should generate an error 
message ... not sure of the best way to test that it actually works, since, if 
it doesn't work, of course, you'll add on another tag, but do you want to try 
and create a 'fake tag' and then delete it afterwards?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGwnoO4QvfyHIvDvMRArHSAJ4qYuexUGn6Ff3/PkHU/DjJBlCQnwCgxyy9
Qc852wMVRRMYdwoD3+p7X3Y=
=FCD9
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

1 2 >

1 - 100 of 110 matches

Mail list logo