date:20071114

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
> Nah, if it's only Niagara, it's not worth bothering.

It's not only that aspect of it --- it's that I am 100% convinced that
Magne has misidentified the source of whatever FPU contention he's
seeing.  The floating-point code in s_lock() is executed only just
after having returned from a sleep that is at least one millisecond
and often many times that.  If Niagara cannot handle a few kiloflops
then you need to find some other company to work for ;-)

I am interested to find out what the true cause of the reported FPU
contention is, but I'll bet our next lunch that s_lock.c ain't it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> On Wed, Nov 14, 2007 at 07:46:58AM +, Gregory Stark wrote:
>> Have you yet given any advantages of contains over @@ ?

> Familiarity for users of SQL Server that are migrating? ;-)
> (http://msdn2.microsoft.com/en-us/library/ms187787.aspx)

That argument would only hold water if we were going to adopt *all* of
their syntax for the feature ... and the other choices they've made seem
pretty ugly to me.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Josh Berkus

Greg,

> That says precisely nothing about the matter at hand. Someone should
> simply change it and benchmark it in pgsql. I doubt you'll see a
> difference there on regular AMD/Intel ... and if it makes the sun
> hyperthreaded cpu happier...

Nah, if it's only Niagara, it's not worth bothering.  I was under the 
impression that most x86 chips had similar issues around having less FP than 
IP, but I could be wrong.  It's also worth thinking about the new Intel 
multi-core archtectures; do they starve FPs as well?

On a busy oltp system, spinlock waits get called 100's of times per second, so 
like procarraylock this it's frequent enought to call for microoptimization.

I think maybe we should try making the change and testing it on Niagara and on 
some standard x86 platforms, and then on the new x86 architectures, to see if 
it makes any difference in CPU utilization.

FYI, if nobody had guessed this is coming out of study Magne is doing on 
improving PostgreSQL SMP scalability.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> [ replace LIKE with this: ]

> CREATE FUNCTION textlike_ts(text, text) RETURNS boolean
> RETURNS NULL ON NULL INPUT IMMUTABLE
> LANGUAGE SQL
> AS $$ SELECT $1 @@ likepattern_to_tsquery($2) AND $1 #~~# $2; $$;

Cute trick, but as-is this will not make anything go any faster, because
it doesn't expose any opportunities for indexing the @@ operation.
I think what you'd really need is something like

$$ SELECT to_tsvector('english', $1) @@ likepattern_to_tsquery($2) AND $1 #~~# 
$2; $$;

which will win if there is an expression index on to_tsvector('english',
).  (You can substitute your preferred configuration of
course, but you don't get to rely on default_text_search_config, because
that would render the expression non-immutable and thus non-indexable.)

This points up the same old notational problem that there is no good
place in the operator notation to mention which text search
configuration you want to use.  Simon's suggestion of a three-parameter
function at least addresses that issue.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Joshua D. Drake


Gregory Maxwell wrote:

On Nov 14, 2007 10:12 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

http://www.intel.com/performance/server/xeon/intspd.htm
http://www.intel.com/performance/server/xeon/fpspeed.htm


That says precisely nothing about the matter at hand. Someone should
simply change it and benchmark it in pgsql. I doubt you'll see a
difference there on regular AMD/Intel ... and if it makes the sun
hyperthreaded cpu happier...


Are you offering?

Joshua D. Drake



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Gregory Maxwell

On Nov 14, 2007 10:12 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> http://www.intel.com/performance/server/xeon/intspd.htm
> http://www.intel.com/performance/server/xeon/fpspeed.htm

That says precisely nothing about the matter at hand. Someone should
simply change it and benchmark it in pgsql. I doubt you'll see a
difference there on regular AMD/Intel ... and if it makes the sun
hyperthreaded cpu happier...

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Steve Atkins



On Nov 14, 2007, at 6:57 PM, Josh Berkus wrote:


Tom,

I've got one upstairs (HPPA), and I believe that it's actually a  
pretty

common situation in scientifically-oriented workstations from a few
years back.


Last I checked, scientific workstations aren't exactly a common  
platform for

PostgreSQL servers.

The question is, for our most common platforms (like AMD and Intel)  
is the FPU
notably slower/more contended than integer division?  I'd the  
impression that
it was, but my knowledge of chip architectures is liable to be out  
of date.


Can we have a hardware geek speak up?


Somewhat. The last version of K7 I looked at had three integer  
execution units versus one floating point unit.


They're also scheduled fairly independently, meaning that casts from  
double to integer or back again will have some minor negative effects  
on the pipeline or the scheduler more than the use of floating point  
itself.


In the grand scheme of things, though, I don't believe it's a big  
deal for typical code on most modern desktop CPUs, certainly not  
compared to memory starvation, use of less than optimal compilers and  
all the other reasons the pipeline might stall. I might care in the  
innermost of inner loops, but possibly not even then unless a  
profiler told me differently.


Cheers,
  Steve


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Joshua D. Drake


Josh Berkus wrote:

Tom,


I've got one upstairs (HPPA), and I believe that it's actually a pretty
common situation in scientifically-oriented workstations from a few
years back.


Last I checked, scientific workstations aren't exactly a common platform for 
PostgreSQL servers.


The question is, for our most common platforms (like AMD and Intel) is the FPU 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.


Can we have a hardware geek speak up?



http://www.intel.com/performance/server/xeon/intspd.htm
http://www.intel.com/performance/server/xeon/fpspeed.htm


Sincerely,

Joshua D. Drake


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Josh Berkus

Tom,

> I've got one upstairs (HPPA), and I believe that it's actually a pretty
> common situation in scientifically-oriented workstations from a few
> years back.

Last I checked, scientific workstations aren't exactly a common platform for 
PostgreSQL servers.

The question is, for our most common platforms (like AMD and Intel) is the FPU 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.

Can we have a hardware geek speak up?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Alvaro Herrera

David Fetter wrote:
> On Wed, Nov 14, 2007 at 05:15:20PM -0300, Alvaro Herrera wrote:
> > Peter Eisentraut wrote:
> > > Letting psql execute a script file that is really a directory
> > > doesn't complain at all:
> > > 
> > > $ psql -f /tmp
> > > 
> > > Should we do some kind of stat() before opening the file and abort
> > > if it's a directory?
> > 
> > Actually anything other than a plain file, right?  (Do we really
> > want to be able to psql -f a_pipe?)
> 
> Yes, I have seen people use just this technique.

Interesting.  Why not just use a standard shell pipe from the command
writing into the named pipe, instead of piping through it?

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread David Fetter

On Wed, Nov 14, 2007 at 05:15:20PM -0300, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
> > Letting psql execute a script file that is really a directory
> > doesn't complain at all:
> > 
> > $ psql -f /tmp
> > 
> > Should we do some kind of stat() before opening the file and abort
> > if it's a directory?
> 
> Actually anything other than a plain file, right?  (Do we really
> want to be able to psql -f a_pipe?)

Yes, I have seen people use just this technique.

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Trevor Talbot

On 11/14/07, Peter Eisentraut <[EMAIL PROTECTED]> wrote:
> I wrote:
> > What we'd need is a way to convert a LIKE pattern into a tsquery
> > ('%foo%bar%' => 'foo & bar').  Then you might even be able to sneak
> > index-optimized text search into existing applications.  Might be worth a
> > try.
>
> Here is how this could work:
>
> CREATE FUNCTION likepattern_to_tsquery(text) RETURNS tsquery

[...]

But that coversion itself is fundamentally flawed, is the problem.

'foo bar'
'fooandbar'
'barfoo and foobar'

'%foo%bar%' matches all 3.
'foo & bar' matches only the first.

If the application currently using LIKE actually wants a word-based
search, it should probably just convert to using tsearch wholesale,
since it doesn't work as intended now.  If it actually wants wildcard
matching behavior, it can't use tsearch at all.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread Jignesh K. Shah


I dont understand vacuum a lot.. I admit I am stupid :-)

When you say scanned... do you mean reads or do you mean writes?

Since its really writes that I am having trouble.. the auto vacuum 
message tells me 11 pages were removed and so many tuples were 
removed..  I am guessing its writes.


I can try vacuuming that table before it starts the run to see it can 
avoid that..


-Jignesh




Tom Lane wrote:

"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:
  
So from the PostgreSQL view things are doing fine based on outputs: I 
need to figure out the Solaris view on it now.



  

Could it be related to autovacuum happening also?



Maybe ... have you tried fiddling with the vacuum_cost_delay options?

Looking at the autovacuum log output, 

  
2007-11-13 09:21:19.830 PST 9458  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

pages: 11 removed, 105 remain
tuples: 3147 removed, 40 remain
system usage: CPU 0.11s/0.09u sec elapsed 6.02 sec



it seems like a serious omission that this gives you no hint how many
pages were scanned.

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Tom Lane

Oleg Bartunov <[EMAIL PROTECTED]> writes:
> On Wed, 14 Nov 2007, Tom Lane wrote:
>> Huh?  This is just an option for the "simple" dictionary, it's got
>> nothing to do with thesaurus AFAICS.

> I can assign simple dictionary as a normalization dictionary for thesaurus

Sure.  So what?  You wouldn't use this option in that case.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Martijn van Oosterhout

On Wed, Nov 14, 2007 at 10:25:23PM +0100, Peter Eisentraut wrote:
> Martijn van Oosterhout wrote:
> > To be honest I think that psql shouldn't be ignoring the
> > EISDIR error the kernel is returning.
> 
> We use fopen(), which doesn't appear to pass that on.

It's not the fopen that fails, it's the fgets that returns NULL. We
don't subsequently check if that's due to an I/O error or EISDIR or if
it's an end-of-file.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution 
> inevitable.
>  -- John F Kennedy


signature.asc
Description: Digital signature

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Peter Eisentraut

Martijn van Oosterhout wrote:
> To be honest I think that psql shouldn't be ignoring the
> EISDIR error the kernel is returning.

We use fopen(), which doesn't appear to pass that on.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread Tom Lane

"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:
> So from the PostgreSQL view things are doing fine based on outputs: I 
> need to figure out the Solaris view on it now.

> Could it be related to autovacuum happening also?

Maybe ... have you tried fiddling with the vacuum_cost_delay options?

Looking at the autovacuum log output, 

> 2007-11-13 09:21:19.830 PST 9458  LOG:  automatic vacuum of table 
> "specdb.public.txn_log_table": index scans: 1
> pages: 11 removed, 105 remain
> tuples: 3147 removed, 40 remain
> system usage: CPU 0.11s/0.09u sec elapsed 6.02 sec

it seems like a serious omission that this gives you no hint how many
pages were scanned.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Hash index todo list item

2007-11-14 Thread Kenneth Marshall

On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
> On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
> > This is a great starting point. I would appreciate it if you have the
> > time and could make it apply cleanly to HEAD.
> 
> Just to give you an update on this, I'll try to find the time to get it
> done soon, but my day job is keeping me really busy these days, so I'm
> not sure when I'll be able to get to it...
> 
> -Neil
> 
Neil,

I have been working on putting an updated version of your
patch into the current source. My first try was to try and
put your patch in directly, but it differed so much from the
current build that it was not obvious how to address things
like the current hash_index sorted build patch, which I need
to be able to test with indexes of any size at all. My current
try is to replace the _hash_formitem() calls with a function
called _hash_form_tuple() that actually returns an IndexTuple
and not an HashItem. This will allow it to be used quite
naturally with the current sorted build patch. Here is what
it looks like now:

/*
 * _hash_form_tuple -- construct index tuple using hash(value) not value
 */
IndexTuple
_hash_form_tuple(IndexTuple itup, Relation rel)
{
IndexTuple  result;
Sizesize;
uint32  hashkey;
Datum   datum;
boolisnull;

/* disallow nulls in hash keys */
if (IndexTupleHasNulls(itup))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("hash indexes cannot contain null keys")
));

if (rel->rd_rel->relnatts != 1)
elog(ERROR, "hash indexes support only one index key");

/* hash the tuple; we only store the hash value in the index */
datum = index_getattr(itup, 1, RelationGetDescr(rel), &isnull);
Assert(!isnull);
hashkey = _hash_datum2hashkey(rel, datum);

size = IndexTupleSize(itup);
result = (IndexTuple) palloc(size);
memcpy(result, itup, size);
return result;
}

I am not currently doing anything other than returning the current
IndexTuple that was created with index_form_tuple(). Am I daft, or
can I just memcpy() the 6 bytes of TID, add the 2 bytes of t_info
(everything 0 and the size set to 6 + 2 + sizeof(hash) = 10), and
the 4 bytes of hash. This will allow me to handle 8-byte hashes
in the future. If you see a problem with this approach, please
let me know. I would appreciate any feedback you can give.

Regards,
Ken
> 
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread Jignesh K. Shah



I was waiting to digest  what I saw before sending it to the group

I am running EAStress workload

I am using odata_sync which should sync as soon as it is written

with checkpoint_completion_target=0.9 and checkpoint_time=5m it seems to 
be doing the right thing from the logfile output



2007-11-13 09:20:49.070 PST 9180  LOG:  checkpoint starting: time
2007-11-13 09:21:13.808 PST 9458  LOG:  automatic analyze of table 
"specdb.public.o_orderline" system usage: CPU 0.03s/0.50u sec elapsed 
7.79 sec
2007-11-13 09:21:19.830 PST 9458  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

   pages: 11 removed, 105 remain
   tuples: 3147 removed, 40 remain
   system usage: CPU 0.11s/0.09u sec elapsed 6.02 sec
2007-11-13 09:22:12.112 PST 9462  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

   pages: 28 removed, 77 remain
   tuples: 1990 removed, 95 remain
   system usage: CPU 0.11s/0.09u sec elapsed 5.98 sec
2007-11-13 09:23:12.121 PST 9466  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3178 removed, 128 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.87 sec
2007-11-13 09:24:12.220 PST 9470  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3394 removed, 57 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.85 sec
2007-11-13 09:25:12.400 PST 9474  LOG:  automatic vacuum of table 
"specdb.public.txn_log_table": index scans: 1

   pages: 0 removed, 77 remain
   tuples: 3137 removed, 1 remain
   system usage: CPU 0.11s/0.04u sec elapsed 5.93 sec
2007-11-13 09:25:18.723 PST 9180  LOG:  checkpoint complete: wrote 33362 
buffers (2.2%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=269.642 s, sync=0.003 s, total=269.653 s

2007-11-13 09:25:49.000 PST 9180  LOG:  checkpoint starting: time


However actual iostat output still shows non-uniform distribution  but I 
havent put the exact time stamp on the iostat outputs to correlate that 
with the logfile entries.. Maybe I should do that.


So from the PostgreSQL view things are doing fine based on outputs: I 
need to figure out the Solaris view on it now.


Could it be related to autovacuum happening also?


Regards,
Jignesh



Tom Lane wrote:

"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:
  

I will turn on checkpoint_logging to get more idea as Heikki suggested



Did you find out anything?

Did this happen on every checkpoint, or only some of them?  The bug
Itagaki-san pointed out today in IsCheckpointOnSchedule might account
for some checkpoints being done at full speed, but not all ...

regards, tom lane
  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Andrew Dunstan




Alvaro Herrera wrote:

Peter Eisentraut wrote:
  
Letting psql execute a script file that is really a directory doesn't complain 
at all:


$ psql -f /tmp

Should we do some kind of stat() before opening the file and abort if it's a 
directory?



Actually anything other than a plain file, right?  (Do we really want to
be able to psql -f a_pipe?)
  


I don't see why not.

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Trevor Talbot

On 11/14/07, Tom Lane <[EMAIL PROTECTED]> wrote:

> The other problem with using modulo is that it makes the result depend
> mostly on the low-order bits of the random() result, rather than mostly
> on the high-order bits; with lower-grade implementations of random(),
> the lower bits are materially less random than the higher.  Now
> admittedly high-grade randomness is probably not too important for this
> specific context, but I dislike putting in poor coding practices that
> someone might see and copy without thinking...

If there's a dependency on a particular quality of random()
implementation, why not just include one?  Mersenne Twister is easy,
while not being cryptographic strength.
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Oleg Bartunov


On Wed, 14 Nov 2007, Tom Lane wrote:


Bruce Momjian <[EMAIL PROTECTED]> writes:

Yes, this the same problem we had months ago trying to improve the
syntax, that there was no easy syntax that covered all common use cases.


The thing that we keep coming up against is that we'd like queries to
be able to depend on default_text_search_config.  Quite aside from
anyone's personal opinions about whether operator or function syntax
is preferable, it would clearly be desirable to be able to say

... WHERE textcolumn @@ 'pattern-constant'

and get a fast full-text search that's governed by the current setting
of default_text_search_config (of course, it can only be fast if there
is an index using that same configuration, but that's a setup detail).

It strikes me that now that we have an invalidatable plan cache,
we could fix this by having the planner rewrite "textcolumn @@ something"
into "to_tsvector(regconfig_constant, textcolumn) @@ something",
so long as it marked the resulting plan as needing to be invalidated
by any change in the value of default_text_search_config.  Once you
have that form, it can be matched against an index, and away you go.

Too late for 8.3, but seems doable enough for 8.4.


cute, pity it's didn't come up in our early discussion



regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Peter Eisentraut

Letting psql execute a script file that is really a directory doesn't complain 
at all:

$ psql -f /tmp

Should we do some kind of stat() before opening the file and abort if it's a 
directory?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Yes, this the same problem we had months ago trying to improve the
> syntax, that there was no easy syntax that covered all common use cases.

The thing that we keep coming up against is that we'd like queries to
be able to depend on default_text_search_config.  Quite aside from
anyone's personal opinions about whether operator or function syntax
is preferable, it would clearly be desirable to be able to say

... WHERE textcolumn @@ 'pattern-constant'

and get a fast full-text search that's governed by the current setting
of default_text_search_config (of course, it can only be fast if there
is an index using that same configuration, but that's a setup detail).

It strikes me that now that we have an invalidatable plan cache,
we could fix this by having the planner rewrite "textcolumn @@ something"
into "to_tsvector(regconfig_constant, textcolumn) @@ something",
so long as it marked the resulting plan as needing to be invalidated
by any change in the value of default_text_search_config.  Once you
have that form, it can be matched against an index, and away you go.

Too late for 8.3, but seems doable enough for 8.4.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Bruce Momjian

Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Better idea:
> 
> > create function
> > contains(sourceText text, searchText text, config text) returns boolean
> > as $$
> > to_tsvector(config, sourceText) @@ to_tsquery(config, searchText);
> > $$ language sql;
> 
> I think you have confused "simple query syntax" with "easy to use".
> 
> The above does make the query notation look nicer, but the query will
> not actually go fast unless the DBA has made an expression index on
> to_tsvector('desired config', textcolumn).  Thus, in terms of getting
> people "over the hump" of doing their first decently-performing text
> search, you haven't reduced the number of concepts to be understood
> at all; rather, you've added one more.  People will still have to
> understand the complexity that contains() is supposedly hiding.
> Worse, they'll have to make the connection between two completely
> different-looking syntaxes every time they look at their schemas.

Yes, this the same problem we had months ago trying to improve the
syntax, that there was no easy syntax that covered all common use cases.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> Better idea:

> create function
> contains(sourceText text, searchText text, config text) returns boolean
> as $$
> to_tsvector(config, sourceText) @@ to_tsquery(config, searchText);
> $$ language sql;

I think you have confused "simple query syntax" with "easy to use".

The above does make the query notation look nicer, but the query will
not actually go fast unless the DBA has made an expression index on
to_tsvector('desired config', textcolumn).  Thus, in terms of getting
people "over the hump" of doing their first decently-performing text
search, you haven't reduced the number of concepts to be understood
at all; rather, you've added one more.  People will still have to
understand the complexity that contains() is supposedly hiding.
Worse, they'll have to make the connection between two completely
different-looking syntaxes every time they look at their schemas.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Martijn van Oosterhout

On Wed, Nov 14, 2007 at 09:33:17PM +0100, Zdenek Kotala wrote:
> >Sure, why not. To be honest I think that psql shouldn't be ignoring the
> >EISDIR error the kernel is returning.
> 
> But it works when you open directory in read-only mode. See posix 
> definition:
> 
> [EISDIR]
> The named file is a directory and oflag includes O_WRONLY or O_RDWR.

$ strace psql -f /tmp

open("/tmp", O_RDONLY|O_LARGEFILE)  = 4

read(4, 0xb7f1b000, 4096)   = -1 EISDIR (Is a directory)

Which is subsequently ignored. I'm hoping it doesn't ignore other
errors, like EIO or EPIPE, 

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution 
> inevitable.
>  -- John F Kennedy


signature.asc
Description: Digital signature

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Peter Eisentraut

I wrote:
> What we'd need is a way to convert a LIKE pattern into a tsquery
> ('%foo%bar%' => 'foo & bar').  Then you might even be able to sneak
> index-optimized text search into existing applications.  Might be worth a
> try.

Here is how this could work:

CREATE FUNCTION likepattern_to_tsquery(text) RETURNS tsquery
RETURNS NULL ON NULL INPUT IMMUTABLE
LANGUAGE SQL
AS $$ SELECT trim(replace($1, '%', ' & '), '& ')::tsquery; $$;


UPDATE pg_operator SET oprname = '#~~#' WHERE oprcode = 'textlike'::regproc;


CREATE FUNCTION textlike_ts(text, text) RETURNS boolean
RETURNS NULL ON NULL INPUT IMMUTABLE
LANGUAGE SQL
AS $$ SELECT $1 @@ likepattern_to_tsquery($2) AND $1 #~~# $2; $$;


CREATE OPERATOR ~~ (
PROCEDURE = textlike_ts,
LEFTARG = text,
RIGHTARG = text
);

Maybe something like this could be useful for people who cannot readily change 
their application code.  (Of course it is not meant to solve the issue of how 
to make the text-search functionality itself easier to access.)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Zdenek Kotala


Alvaro Herrera wrote:

Peter Eisentraut wrote:
Letting psql execute a script file that is really a directory doesn't complain 
at all:


$ psql -f /tmp

Should we do some kind of stat() before opening the file and abort if it's a 
directory?


Actually anything other than a plain file, right?  (Do we really want to
be able to psql -f a_pipe?)



What's about symlink to regular file/pipe?


Zdenek

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Zdenek Kotala


Martijn van Oosterhout wrote:

On Wed, Nov 14, 2007 at 05:15:20PM -0300, Alvaro Herrera wrote:
Should we do some kind of stat() before opening the file and abort if it's a 
directory?

Actually anything other than a plain file, right?  (Do we really want to
be able to psql -f a_pipe?)


Sure, why not. To be honest I think that psql shouldn't be ignoring the
EISDIR error the kernel is returning.


But it works when you open directory in read-only mode. See posix 
definition:


[EISDIR]
The named file is a directory and oflag includes O_WRONLY or O_RDWR.


Zdenek

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Martijn van Oosterhout

On Wed, Nov 14, 2007 at 05:15:20PM -0300, Alvaro Herrera wrote:
> > Should we do some kind of stat() before opening the file and abort if it's 
> > a 
> > directory?
> 
> Actually anything other than a plain file, right?  (Do we really want to
> be able to psql -f a_pipe?)

Sure, why not. To be honest I think that psql shouldn't be ignoring the
EISDIR error the kernel is returning.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution 
> inevitable.
>  -- John F Kennedy


signature.asc
Description: Digital signature

Re: [HACKERS] psql -f doesn't complain about directories

2007-11-14 Thread Alvaro Herrera

Peter Eisentraut wrote:
> Letting psql execute a script file that is really a directory doesn't 
> complain 
> at all:
> 
> $ psql -f /tmp
> 
> Should we do some kind of stat() before opening the file and abort if it's a 
> directory?

Actually anything other than a plain file, right?  (Do we really want to
be able to psql -f a_pipe?)

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Oleg Bartunov


On Wed, 14 Nov 2007, Tom Lane wrote:


Oleg Bartunov <[EMAIL PROTECTED]> writes:

On Wed, 14 Nov 2007, Tom Lane wrote:

Huh?  This is just an option for the "simple" dictionary, it's got
nothing to do with thesaurus AFAICS.



I can assign simple dictionary as a normalization dictionary for thesaurus


Sure.  So what?  You wouldn't use this option in that case.


Right. That should be documented to avoid possible confusion.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Oleg Bartunov


On Wed, 14 Nov 2007, Tom Lane wrote:


Oleg Bartunov <[EMAIL PROTECTED]> writes:

On Wed, 14 Nov 2007, Tom Lane wrote:

One thought that came to mind is that the option name should be just
"Accept" not "AcceptAll".  To me "All" implies that it would accept
*everything* ... including stopwords.



wait, I remind the problem with filters. How it will works with thesaurus ?


Huh?  This is just an option for the "simple" dictionary, it's got
nothing to do with thesaurus AFAICS.


I can assign simple dictionary as a normalization dictionary for thesaurus

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] plpgsql debugger status - question for packagers plz

2007-11-14 Thread Richard Huxton


As I understand it*, the situation here is:

1. Client support in pgadmin
2. Support in the Windows installer (optional?)
3. Not in /contrib

Now, I realise "it's too late for 8.3" is a bloody good argument, but 
I'm curious as to whether packagers on other platforms are planning. 
Traditionally I think the extras stuff for PG packages has been /contrib 
but obviously that's not a strict rule.



* PS - it's a useful feature - good work to all concerned*
** PPS - 8.3 is looking good too. This short development cycle is 
working wonders ;-)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Greg Smith


On Wed, 14 Nov 2007, Mark Mielke wrote:


The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.


If this was a serious problem, there is the >> operator. I see it as a poor 
coding practice to make assumptions about which bits are most "random" in a 
call to random().


There are many types of pseudo-random number generators where the 
low-order bits are not so random, and the assumption Tom has described is 
pretty likely to be true.  See http://www.fourmilab.ch/random/ as one 
comment about the badness of the standard UNIX random generator for 
example.


There is an interesting discussion of this issue along with code showing a 
way to improve things while only using integer math (which in some cases 
uses >> as you suggest) as part of the Java standard library:


http://java.sun.com/j2se/1.4.2/docs/api/java/util/Random.html#nextInt(int)

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Tom Lane

Oleg Bartunov <[EMAIL PROTECTED]> writes:
> On Wed, 14 Nov 2007, Tom Lane wrote:
>> One thought that came to mind is that the option name should be just
>> "Accept" not "AcceptAll".  To me "All" implies that it would accept
>> *everything* ... including stopwords.

> wait, I remind the problem with filters. How it will works with thesaurus ?

Huh?  This is just an option for the "simple" dictionary, it's got
nothing to do with thesaurus AFAICS.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Oleg Bartunov


On Wed, 14 Nov 2007, Tom Lane wrote:


Oleg Bartunov <[EMAIL PROTECTED]> writes:

Let's consider one example - removing accents.
In the past I always recommend people to use regex functions before
to_tsvector conversion to remove accents, but recently I was noticed that
such trick doesn't work with headline(). So, the only way is to have
special dictionary dict_remove_accent before, which  works as a filter.



I don't remember why do we left this for future releases, though.


That would require a system-to-dictionary API change (to be able to
modify the token under inspection), no?  So it's certainly something


It requires one reserved option for dictionaries and  ability to get dictionary 
option.  Unless somebody have dictionary with the same option, this change

looks harmless.


I'd say is too late for 8.3.


yes, probably we get better idea.



One thought that came to mind is that the option name should be just
"Accept" not "AcceptAll".  To me "All" implies that it would accept
*everything* ... including stopwords.


wait, I remind the problem with filters. How it will works with thesaurus ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Tom Lane

Oleg Bartunov <[EMAIL PROTECTED]> writes:
> Let's consider one example - removing accents.
> In the past I always recommend people to use regex functions before
> to_tsvector conversion to remove accents, but recently I was noticed that
> such trick doesn't work with headline(). So, the only way is to have
> special dictionary dict_remove_accent before, which  works as a filter.

> I don't remember why do we left this for future releases, though.

That would require a system-to-dictionary API change (to be able to
modify the token under inspection), no?  So it's certainly something
I'd say is too late for 8.3.

One thought that came to mind is that the option name should be just
"Accept" not "AcceptAll".  To me "All" implies that it would accept
*everything* ... including stopwords.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Oleg Bartunov

In principle the right way is to allow any dictionary have option 
like 'PassThrough' and internal function get_dict_options(dict, option) 
to check if PassThrough option is true.

Let's consider one example - removing accents.
In the past I always recommend people to use regex functions before
to_tsvector conversion to remove accents, but recently I was noticed that
such trick doesn't work with headline(). So, the only way is to have
special dictionary dict_remove_accent before, which  works as a filter.

I don't remember why do we left this for future releases, though.

Oleg
On Wed, 14 Nov 2007, Tom Lane wrote:


This patch:
http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
seems simple and useful enough that I think we ought to slip it into
8.3, even though we are far past feature freeze.

As the "simple" dictionary type stands in CVS HEAD, it is only useful as
the last dictionary in a stack, since it never passes anything on as
unrecognized.  With the proposed AcceptAll = false option, it could be
used to filter out some stopwords before feeding tokens to another
dictionary.  While most dictionary types have their own stopword support,
some of them match stopwords after their own normalization processing,
and so there's no way to filter on pre-normalized words.  That seems
like a good improvement, even without the specific need-example that
Jan provided at the start of the thread.

Normally we'd never consider adding a new feature so late in the
development cycle, but this seems small enough and useful enough
to make an exception.  Comments?

regards, tom lane



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Bruce Momjian

Tom Lane wrote:
> This patch:
> http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
> seems simple and useful enough that I think we ought to slip it into
> 8.3, even though we are far past feature freeze.
> 
> As the "simple" dictionary type stands in CVS HEAD, it is only useful as
> the last dictionary in a stack, since it never passes anything on as
> unrecognized.  With the proposed AcceptAll = false option, it could be
> used to filter out some stopwords before feeding tokens to another
> dictionary.  While most dictionary types have their own stopword support,
> some of them match stopwords after their own normalization processing,
> and so there's no way to filter on pre-normalized words.  That seems
> like a good improvement, even without the specific need-example that
> Jan provided at the start of the thread.
> 
> Normally we'd never consider adding a new feature so late in the
> development cycle, but this seems small enough and useful enough
> to make an exception.  Comments?

Agreed.  The logic is that textsearch is getting a major overhaul in 8.3
and it is reasonable to keep adjusting things during beta.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2007-11-14 Thread Tom Lane

This patch:
http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
seems simple and useful enough that I think we ought to slip it into
8.3, even though we are far past feature freeze.

As the "simple" dictionary type stands in CVS HEAD, it is only useful as
the last dictionary in a stack, since it never passes anything on as
unrecognized.  With the proposed AcceptAll = false option, it could be
used to filter out some stopwords before feeding tokens to another
dictionary.  While most dictionary types have their own stopword support,
some of them match stopwords after their own normalization processing,
and so there's no way to filter on pre-normalized words.  That seems
like a good improvement, even without the specific need-example that
Jan provided at the start of the thread.

Normally we'd never consider adding a new feature so late in the
development cycle, but this seems small enough and useful enough
to make an exception.  Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

Mark Mielke <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> My goodness that's a hardware-dependent proposal.  Shall we discuss
>> how many CPUs there are where an integer division is *slower* than
>> a floating-point op?

> Do you have one in mind, or is this a straw man? :-)

I've got one upstairs (HPPA), and I believe that it's actually a pretty
common situation in scientifically-oriented workstations from a few
years back.

>> Why do you think that a couple of FP ops here are a problem, anyway?
>> This is a code path where we've already yielded the processor, so
>> by definition the repetition rate has to be pretty low.

> Yielded the processor?

Yielded the processor, as in pg_usleep.  It is absolutely impossible for
any thread to execute that line of code more than 1000 times per second,
and the expected rate would be very much less.  Furthermore, the entire
point of the function is to try to make processes come out of the sleep
at different times, so they shouldn't be ganging up on the FPU anyway.

There may be some place where we have an unnecessarily high amount
of FP usage, but I very much doubt that this is it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Mark Mielke


Tom Lane wrote:

=?ISO-8859-1?Q?Magne_M=E6hre?= <[EMAIL PROTECTED]> writes:
  

I understand the reasoning for the backoff (as of the discussion on
2003-08-05), but is there any particular reason for using floating
point operations here ?   Maybe a modulo would be just as good (or
better since it doesn't involve the FPU) ?


My goodness that's a hardware-dependent proposal.  Shall we discuss
how many CPUs there are where an integer division is *slower* than
a floating-point op?
  

Hi Tom:

Do you have one in mind, or is this a straw man? :-)


Why do you think that a couple of FP ops here are a problem, anyway?
This is a code path where we've already yielded the processor, so
by definition the repetition rate has to be pretty low.
Yielded the processor? Or yielded the lock? With 32 active threads 
contending for the lock, but first contending for the FPU, one could see 
how it might be relevant.


I think I agree with you that this won't be the only problem, however, 
the FPU may not need to contribute to the problem?



The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.  Now
admittedly high-grade randomness is probably not too important for this
specific context, but I dislike putting in poor coding practices that
someone might see and copy without thinking...
  
If this was a serious problem, there is the >> operator. I see it as a 
poor coding practice to make assumptions about which bits are most 
"random" in a call to random(). If anybody fixes the problem you 
describe, then the opposite may become true. Perhaps the lowest bits are 
the most random. If random() is broken, random() should be fixed. Coding 
in specific implementation details about specific implementations of 
random() is just as bad. :-)


IMHO, use of FPU should be avoided wherever possible on any platform. On 
some platforms, the FPU may be fully implemented in software. My memory 
is faint, but I think SPARC v7 either implemented / in software, or had 
a trap that implemented it in software.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

=?ISO-8859-1?Q?Magne_M=E6hre?= <[EMAIL PROTECTED]> writes:
> I understand the reasoning for the backoff (as of the discussion on
> 2003-08-05), but is there any particular reason for using floating
> point operations here ?   Maybe a modulo would be just as good (or
> better since it doesn't involve the FPU) ?

My goodness that's a hardware-dependent proposal.  Shall we discuss
how many CPUs there are where an integer division is *slower* than
a floating-point op?

Why do you think that a couple of FP ops here are a problem, anyway?
This is a code path where we've already yielded the processor, so
by definition the repetition rate has to be pretty low.

The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.  Now
admittedly high-grade randomness is probably not too important for this
specific context, but I dislike putting in poor coding practices that
someone might see and copy without thinking...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Bruce Momjian

Simon Riggs wrote:
> Better idea:
> 
> in-linable function called
> 
> create function
> contains(sourceText text, searchText text, config text) returns boolean
> as $$
> to_tsvector(config, sourceText) @@ to_tsquery(config, searchText);
> $$ language sql;
> 
> so that 
> 
> SELECT title
> FROM pgweb
> WHERE contains(body, 'a & b', 'english')
> 
> is an indexable, easily readable way of using full text search.
> 
> allowing 
> 
> SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat');
>  ?column? 
> --
>  t
> 
> to become
> 
> SELECT contains('fat cats ate fat rats', 'fat & rat', 'english');
>  ?column? 
> --
>  t
> 
> Proposed changes:
> 1. Add function contains()
> 2. Alter docs to show use of contains()
> 
> All other @@ features still the same

One advantage to this syntax is that it allows the specification of the
configuration name, which needs a function call to be done with the @@
operator.  

However, to access a tsvector column we would need to add another
function that takes a tsvector value.  However, the config specification
would apply only to the text column, not the pre-computed tsvector, so
that might be confusing.  I think this method could use a function index
but it wouldn't be very clear from the syntax.

The problem I always kept having in clarifying the syntax is that I
could never find anything that covered all common use cases.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] How to keep a table in memory?

2007-11-14 Thread Zeugswetter Andreas ADI SD

Kevin Grittner wrote:
> > . . .the abuse of such hints in applications I have seen is so
rampant as to
> > make me doubt the utility of adding them anyway.  It's true that by
adding
> > hints, you give a facility to a good, competent designer who has a
really

> I have trouble not seeing the point of any posts in this thread.
> Under our old, commercial database product, we had performance
> problems we addressed with a "named caches" feature -- you could
> declare a named cache of a particular size, and tweak some
> characteristics of it, then bind objects to it.  We came up with

Seems you simply fall in the competent category :-)

I know that another commercial product had introduced a pin table into
memory 
feature for a few years, but dropped it again in the current release.
It seems the potential for wrongdoing is significant :-(
At least a "lock this table into memory" must be accompanied by an
"allow a max percentage of buffercache" and something that loads the 
table on startup. But what do you do if it does not fit ?
Caching only parts of the table is useless for the mentioned use-case.

One aspect that has not been addressed is whether there is a way to 
cluster/partition the table in a way that reduces/clusters the number of
pages that need to 
be fetched by these not frequent enough but performance critical queries
?

This may solve the problem with a different approach.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Zdenek Kotala


Magne Mæhre wrote:


I was playing with a Nevada server and noticed a rush on the FPU
(the Nevada has a single shared FPU for its 32 threads).


Probably you mean Niagara ;-).


Zdenek

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Magne Mæhre



I was playing with a Nevada server and noticed a rush on the FPU
(the Nevada has a single shared FPU for its 32 threads).
Looking at the spinlock code, I found :

 cur_delay += (int) (cur_delay *
 ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);

I understand the reasoning for the backoff (as of the discussion on
2003-08-05), but is there any particular reason for using floating
point operations here ?   Maybe a modulo would be just as good (or
better since it doesn't involve the FPU) ?

Something like:
   cur_delay  +=  random() % (cur_delay + 1) ;


--Magne

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] LDC - Load Distributed Checkpoints with PG8.3b2 on Solaris

2007-11-14 Thread ITAGAKI Takahiro

"Jignesh K. Shah" <[EMAIL PROTECTED]> wrote:

> I am running tests with PG8.3b2 on Solaris 10 8/07 and I still see IO 
> flood when checkpoint happens.

Are there any i/o tuning knobs in Solaris? LDC in 8.3 expects writing
activity in kernel is strong enough to keep dirty pages in kernel
in a small number. i.e, fsync should be cheap.

Larger shared_buffers, checkpoint_timeout, checkpoint_segments and
checkpoint_completion_target make checkpoints more smooth.
But there is a trade-off between smoothness and recovery time.

Stronger writer processes in kernel make checkpoints more smooth.
But there is a trade-off between smoothness and throughput.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Simplifying Text Search

2007-11-14 Thread Magnus Hagander

On Wed, Nov 14, 2007 at 07:46:58AM +, Gregory Stark wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> 
> > Proposed changes:
> > 1. Add function contains()
> > 2. Alter docs to show use of contains()
> >
> > All other @@ features still the same
> 
> Have you yet given any advantages of contains over @@ ?

Familiarity for users of SQL Server that are migrating? ;-)
(http://msdn2.microsoft.com/en-us/library/ms187787.aspx)

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

51 matches

Mail list logo