Re: [HACKERS] .psqlrc output for \pset commands

2008-07-19 Thread Bruce Momjian
Peter Eisentraut wrote:
> Am Thursday, 17. July 2008 schrieb Bruce Momjian:
> 
> > > Anyways the thing that struck me as odd was the messages appearing
> > > *before* the header. It seems to me the header should print followed by
> > > .psqlrc output followed by normal output.
> >
> > Do you like this better?
> >
> > $ psql test
> > psql (8.4devel)
> > Type "help" for help.
> > Output format is wrapped.
> >
> > test=>
> >
> > The attached patch accomplishes this.
> 
> The psqlrc file must be read before the welcome message is printed, so that 
> you can disable the welcome message in the psqlrc file.  Otherwise we are 
> reopening the whole issue of when and whether to print a welcome message that 
> we had just settled.

Oh, yea, sorry.  Reverted.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to universal binaries for Darwin

2008-07-19 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> I'd imagine a related problem are the run tests in configure.  They will 
> produce results for the platform that you run configure on.  More properly, 
> you should run configure in cross-compilation mode (twice, and then merge the
> output, as previously described), but I am not sure how that will turn out 
> when configure attempts to determine alignment and endianness with 
> compilation-only tests.

For the record, I got plausible-looking configure output from tests like

CFLAGS="-arch ppc64" ./configure --host=powerpc64-apple-darwin9.4.0 

Whether it'd actually work I dunno, but it looked plausible.  Two notes:

* You have to use both parts of the recipe: without --host, configure
doesn't think it's cross-compiling, and without CFLAGS, gcc doesn't ;-)

* This disables AC_TRY_RUN tests, of course.  The only adverse
consequence I noticed was failure to recognize that
-Wl,-dead_strip_dylibs is applicable, which is marginally annoying but
hardly fatal.

On the whole I still wouldn't trust cross-compiled configure results.
Better to get your prototype pg_config.h from the real deal.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: primary key patches

2008-07-19 Thread Alvaro Herrera
Markus Wanner wrote:

> (Although, I'm still less than thrilled about the internal storage  
> format of these tuple collections. That can certainly be improved and  
> simplified.)

Care to expand more on what it is?  On Replicator we're using the binary
send/recv routines to transmit tuples.  (Obviously this fails when the
master and slave have differing binary output, but currently we just
punt on this point).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to universal binaries for Darwin

2008-07-19 Thread Peter Eisentraut
Am Saturday, 19. July 2008 schrieb Tom Lane:
> The bad news is that if you only do that, only the arch that you
> actually build on will work.  We have configure set up to insert
> various hardware-dependent definitions into pg_config.h and
> ecpg_config.h, and if you don't have the right values visible for
> each compilation, the resulting executables will fail.

I'd imagine a related problem are the run tests in configure.  They will 
produce results for the platform that you run configure on.  More properly, 
you should run configure in cross-compilation mode (twice, and then merge the 
output, as previously described), but I am not sure how that will turn out 
when configure attempts to determine alignment and endianness with 
compilation-only tests.  You should probably check some of those results very 
carefully and help it out with some cache variables.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] .psqlrc output for \pset commands

2008-07-19 Thread Peter Eisentraut
Am Thursday, 17. July 2008 schrieb Bruce Momjian:

> > Anyways the thing that struck me as odd was the messages appearing
> > *before* the header. It seems to me the header should print followed by
> > .psqlrc output followed by normal output.
>
> Do you like this better?
>
>   $ psql test
>   psql (8.4devel)
>   Type "help" for help.
>   Output format is wrapped.
>
>   test=>
>
> The attached patch accomplishes this.

The psqlrc file must be read before the welcome message is printed, so that 
you can disable the welcome message in the psqlrc file.  Otherwise we are 
reopening the whole issue of when and whether to print a welcome message that 
we had just settled.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to universal binaries for Darwin

2008-07-19 Thread Florian G. Pflug

Tom Lane wrote:

You can get around that by hacking up the generated config files
with #ifdef __i386__ and so on to expose the correct values of
the hardware-dependent symbols to each build.  Of course you have
to know what the correct values are --- if you don't have a sample
of each architecture handy to run configure against, it'd be easy
to miss some things.  And even then it's pretty tedious.  I am
not sure if it is possible or worth the trouble to try to automate
this part better.


Hm - configure *does* the right thing if CFLAGS is set to *just* "-arch 
i386" or "-arch ppc" (at least on intel hardware, because OSX can run 
ppc binaries there, but not vice versa), right? If this is true, we need
some way to run configure multiple times, once for each arch, but then 
still get *one* set of Makefiles that have all the archs in their CFLAGS..



Modulo the above problems, I was able to build i386+ppc binaries that
do in fact work on both architectures.  I haven't got any 64-bit Apple
machines to play with, so there might be 64-bit issues I missed.
Still, this is a huge step forward compared to what was discussed here:
http://archives.postgresql.org/pgsql-general/2008-02/msg00200.php
I think that my MacBook should be able to build and run 64-bit binaries, 
so I can test that if you want. Do you have a script that does the 
necessary config file magic, or did you do that by hand?


regards, Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to universal binaries for Darwin

2008-07-19 Thread Tom Lane
Adriaan van Os <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> You can get around that by hacking up the generated config files
>> with #ifdef __i386__ and so on to expose the correct values of
>> the hardware-dependent symbols to each build.  Of course you have
>> to know what the correct values are --- if you don't have a sample
>> of each architecture handy to run configure against, it'd be easy
>> to miss some things.  And even then it's pretty tedious.  I am
>> not sure if it is possible or worth the trouble to try to automate
>> this part better.

> It may be less pain to simply config and build for ppc and i386 in separate 
> build directories and 
> then glue the resulting binaries together with lipo 

That might give you working executables, but you still need a
glued-together pg_config.h for installation purposes, if you'd
like people to be able to build extensions against the installation.

In any case, the preceding thread showed exactly how to do it that
way, and it didn't look like "less pain" to me  ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: primary key patches

2008-07-19 Thread Markus Wanner

Hi,

chris wrote:

You may want to have a chat with Jan; he's got some thoughts on a more
general purpose mechanism that would be good for this as well as for
(we think) extremely efficient bulk data loading.


Jan, mind to share your thoughts? What use cases for such a general 
purpose mechanism do you see?


What I can imagine doing on top of Postgres-R is: splitting up the data 
and feeding multiple backends with it. Unlike Postgres-R's internal use, 
you'd still have to check the data against constraints, I think.


It would involve the origin backend asking for help from the manager. 
That one checks for available helper backends and then serves as a 
message dispatcher between the origin and helper backends (as it does 
for replication purposes). Please note that it already uses shared 
memory extensively, so the manager doesn't need to copy around the data 
itself.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: primary key patches

2008-07-19 Thread Markus Wanner

Hi,

chris wrote:

I agree with you that tables are *supposed* to have primary keys;
that's proper design, and if tables are missing them, then something
is definitely broken.


Ah, I see, so you are not concerned about tables with a PRIMARY KEY for 
which one wants another REPLICATION KEY, but only about tables without a 
PRIMARY KEY, for which one doesn't want a PRIMARY KEY in the first place.


However, that's a general limitation of replication at tuple level: you 
need to be able to uniquely identify tuples. (Unlike replication on 
storage level, which can use the storage location for that).



Sometimes, unfortunately, people make errors in design, and we wind up
needing to accomodate situations that are "less than perfect."

The "happy happenstance" is that, in modern versions of PostgreSQL, a
unique index may be added in the background so that this may be
rectified without outage if you can live with a "candidate primary
key" rather than a true PRIMARY KEY.


I cannot see any reason for not wanting a PRIMARY KEY, but wanting 
replication, and therefore a REPLICATION KEY.


Or are you saying we should add a hidden REPLICATION KEY for people who 
are afraid of schema changes and dislike a visible primary key? Would 
you want to hide the underlying index as well?



It seems to me that this extension can cover over a number of "design
sins," which looks like a very kind accomodation where it is surely
preferable to design it in earlier rather than later.


Sorry, but I fail to see any real advantage of that covering of "sins". 
I would find it rather confusing to have keys and indices hidden from 
the admin. It's not like an additional index or a primary key would lead 
to functional changes.


That's certainly different for additional columns, where a SELECT * 
could all of a sudden return more columns than before. So that's the 
exception where I agree that hiding such an additional column like we 
already do for system columns would make sense. That's for example the 
situation where you add an 'id' column later on and make that the new 
primary (and thus replication) key. Maybe that's what you meant? 
However, even in that case, I wouldn't hide the index nor the primary 
key, but only the column.


Regards

Markus


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, oprrest function for text search

2008-07-19 Thread Jan Urbański

Jan Urbański wrote:

The idea is (quoting a comment)
/*
 *  Traverse the tsquery preorder, calculating selectivity as:


Ekhm.
This should of course read "postorder"...

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to universal binaries for Darwin

2008-07-19 Thread Adriaan van Os

Tom Lane wrote:

The bad news is that if you only do that, only the arch that you
actually build on will work.  We have configure set up to insert
various hardware-dependent definitions into pg_config.h and
ecpg_config.h, and if you don't have the right values visible for
each compilation, the resulting executables will fail.

You can get around that by hacking up the generated config files
with #ifdef __i386__ and so on to expose the correct values of
the hardware-dependent symbols to each build.  Of course you have
to know what the correct values are --- if you don't have a sample
of each architecture handy to run configure against, it'd be easy
to miss some things.  And even then it's pretty tedious.  I am
not sure if it is possible or worth the trouble to try to automate
this part better.


It may be less pain to simply config and build for ppc and i386 in separate build directories and 
then glue the resulting binaries together with lipo 
 to make them 
"universal".


Regards,

Adriaan van Os


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] gsoc, oprrest function for text search

2008-07-19 Thread Jan Urbański
Here's a WIP patch implementing an oprrest function for tsvector @@ 
tsquery and tsquery @@ tsvector.


The idea is (quoting a comment)
/*
 *  Traverse the tsquery preorder, calculating selectivity as:
 *
 *   selec(left_oper) * selec(right_oper) in AND nodes,
 *
 *   selec(left_oper) + selec(right_oper) -
 *  selec(left_oper) * selec(right_oper) in OR nodes,
 *
 *   1 - select(oper) in NOT nodes
 *
 *   freq[val] in VAL nodes, if the value is in MCELEM
 *   min(freq[MCELEM]) / 2 in VAL nodes, if it is not
 *
 *
 * Implementation-wise, we sort the MCELEM array to use binary
 * search on it.
 */

The patch still has many rough edges, but it applies to HEAD and passes 
tests. I'm posting it mostly to get feedback about whether I'm going in 
the right direction.


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
diff --git a/src/backend/tsearch/Makefile b/src/backend/tsearch/Makefile
index e20a4a2..ba728eb 100644
*** a/src/backend/tsearch/Makefile
--- b/src/backend/tsearch/Makefile
***
*** 19,25 
  OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
!   to_tsany.o ts_typanalyze.o ts_utils.o
  
  include $(top_srcdir)/src/backend/common.mk
  
--- 19,25 
  OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
!   to_tsany.o ts_typanalyze.o ts_selfuncs.o ts_utils.o
  
  include $(top_srcdir)/src/backend/common.mk
  
diff --git a/src/backend/tsearch/ts_selfuncs.c 
b/src/backend/tsearch/ts_selfuncs.c
new file mode 100644
index 000..4b86072
*** /dev/null
--- b/src/backend/tsearch/ts_selfuncs.c
***
*** 0,-1 
--- 1,299 
+ /*-
+  *
+  * ts_selfuncs.c
+  *  Selectivity functions for text search types.
+  *
+  * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *  $PostgreSQL$
+  *
+  *-
+  */
+ #include "postgres.h"
+ 
+ #include "miscadmin.h" /* for check_stack_depth() */
+ #include "utils/memutils.h"
+ #include "utils/builtins.h"
+ #include "utils/syscache.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"
+ #include "catalog/pg_type.h"
+ #include "catalog/pg_statistic.h"
+ #include "nodes/nodes.h"
+ #include "tsearch/ts_type.h"
+ 
+ /* lookup table type for binary searching through MCELEMs */
+ typedef struct
+ {
+   Datum   element;
+   float4  frequency;
+ } TextFreq;
+ 
+ static int
+ compare_textfreq(const void *e1, const void *e2);
+ static Selectivity
+ tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq);
+ static Selectivity
+ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
+  float4 *numbers, int 
nnumbers);
+ static double
+ tsquerysel(VariableStatData *vardata, Datum constval);
+ 
+ 
+ /* TSQuery traversal function */
+ static Selectivity
+ tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq)
+ {
+   TextFreqkey,
+   *searchres;
+   Selectivity s1, s2;
+ 
+   /* since this function recurses, it could be driven to stack overflow */
+   check_stack_depth();
+ 
+   if (item->type == QI_VAL)
+   {
+   QueryOperand *oper = (QueryOperand *) item;
+ 
+   /*
+* Prepare the key for bsearch(). No need to initialize 
key.frequency,
+* because sorting is only on key.element.
+*/
+   key.element = PointerGetDatum(
+   cstring_to_text_with_len(operand + oper->distance, 
oper->length));
+ 
+   searchres = (TextFreq *) bsearch(&key, lookup, length,
+   
 sizeof(TextFreq), compare_textfreq);
+   if (searchres)
+   {
+   /*
+* The element is in MCELEM. Return precise selectivity 
(or at
+* least as precise, as ANALYZE could find out).
+*/
+   return (Selectivity) searchres->frequency;
+   }
+   else
+   {
+   /*
+* The element is not in MCELEM. Punt, but  assure that 
the
+* selectivity cannot be more than minfreq / 2.
+*/
+   return (Selectivity) Min(DEFAULT_TS_SEL, minfreq / 2);
+   }
+   }
+ 
+   /* Current TSQuery node is an operator */
+   switch (item-

Re: [HACKERS] Postgres-R: primary key patches

2008-07-19 Thread chris
[EMAIL PROTECTED] (Markus Wanner) writes:
> Hello Chris,
>
> chris wrote:
>> Slony-I does the same, with the "variation" that it permits the option
>> of using a "candidate primary key," namely an index that is unique+NOT
>> NULL.
>>
>> If it is possible to support that broader notion, that might make
>> addition of these sorts of logic more widely useful.
>
> Well, yeah, that's technically not much different, so it would
> probably be very easy to extend Postgres-R to work on any arbitrary
> Index.
>
> But what do we have primary keys for, in the first place? Isn't it
> exactly the *primay* key into the table, which you want to use for
> replication? Or do we need an additional per-table configuration
> option for that? A REPLICATION KEY besides the PRIMARY KEY?

I agree with you that tables are *supposed* to have primary keys;
that's proper design, and if tables are missing them, then something
is definitely broken.

Sometimes, unfortunately, people make errors in design, and we wind up
needing to accomodate situations that are "less than perfect."

The "happy happenstance" is that, in modern versions of PostgreSQL, a
unique index may be added in the background so that this may be
rectified without outage if you can live with a "candidate primary
key" rather than a true PRIMARY KEY.

It seems to me that this extension can cover over a number of "design
sins," which looks like a very kind accomodation where it is surely
preferable to design it in earlier rather than later.

>> I know Jan Wieck has in mind the idea of adding an interface to enable
>> doing highly efficient IUD (Insert/Update/Delete) via generating a way
>> to do direct heap updates, which would be *enormously* more efficient
>> than the present need (in Slony-I, for instance) to parse, plan and
>> execute thousands of IUD statements.  For UPDATE/DELETE to work
>> requires utilizing (candidate) primary keys, so there is some
>> seemingly relevant similarity there.
>
> Definitely. The remote backend does exactly that for Postgres-R: it
> takes a change set, which consists of one or more tuple collections,
> and then applies these collections. See ExecProcessCollection() in
> execMain.c.
>
> (Although, I'm still less than thrilled about the internal storage
> format of these tuple collections. That can certainly be improved and
> simplified.)

You may want to have a chat with Jan; he's got some thoughts on a more
general purpose mechanism that would be good for this as well as for
(we think) extremely efficient bulk data loading.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules  of the  Evil Overlord  #145. "My  dungeon cell  decor  will not
feature exposed pipes.  While they add to the  gloomy atmosphere, they
are good  conductors of vibrations and  a lot of  prisoners know Morse
code." 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] phrase search

2008-07-19 Thread Oleg Bartunov

Sushant,

the problem of phrase search not in implementation, but in the theoretical
basis. tsearch is query rich and phrase search should support all query
operations, so we need algebra for query operations. We need more time
to investigate this problem, but just have no spare time for this.
If you are interesting, you might think in this direction.

Oleg

On Sat, 19 Jul 2008, Sushant Sinha wrote:


I looked at query operators for tsquery and here are some of the new
query operators for position based queries. I am just proposing some
changes and the questions I have.

1. What is the meaning of such a query operator?

foo #5 bar -> true if the document has word "foo" followed by "bar" at
5th position.

foo #<5 bar -> true if document has word "foo" followed by "bar" with in
5 positions

foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
positions

then some other ways it can be used are
!(foo #<5 bar) -> true if document never has any "foo"  followed by bar
with in 5 positions.

etc .

2. How to implement such query operators?

Should we modify QueryItem to include additional distance information or
is there any other way to accomplish it?

Is the following list sufficient to accomplish this?
a. Modify to_tsquery
b. Modify TS_execute in tsvector_op.c to check new operator

Is there anything needed in rewrite subsystem?

3. Are these valid uses of the operators and if yes what would they
mean?

foo #5 (bar & cup)

If no then should the operator be applied to only two QI_VAL's?

4. If the operator only applies to two query items can we create an
index such that (foo, bar)-> documents[min distance, max distance]
How difficult it is to implement an index like this?


Thanks,
-Sushant.

On Thu, 2008-06-05 at 19:37 +0400, Teodor Sigaev wrote:

I can add index support and support for arbitrary distance between
lexeme.
It appears to me that supporting arbitrary boolean expression will be
complicated. Can we pull out something from TSQuery?


I don't very like an idea to have separated interface for phrase search. Your
patch may be a module and used by people who really wants to have a phrase 
search.

Introducing new operator in tsquery allows to use already existing
infrastructure of tsquery such as concatenations (&&, ||, !!), rewrite subsystem
etc.  But new operation/types specially designed for phrase search makes needing
to make that work again.







Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers