Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Robert Hodges
Hi Hannu,

Hi Hannu,


On 6/1/08 2:14 PM, Hannu Krosing [EMAIL PROTECTED] wrote:


 As a consequence, I don¹t see how you can get around doing some sort
 of row-based replication like all the other databases.

 Is'nt WAL-base replication some sort of row-based replication ?

Yes, in theory.  However, there's a big difference between replicating
physical WAL records and doing logical replication with SQL statements.
Logical replication requires extra information to reconstruct primary keys.
(Somebody tell me if this is already in the WAL; I'm learning the code as
fast as possible but assuming for now it's not.)


  Now that people are starting to get religion on this issue I would
 strongly advocate a parallel effort to put in a change-set extraction
 API that would allow construction of comprehensive master/slave
 replication.

 Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
 Kreen's presentation at pgCon08.


Thanks very much for the pointer.  The slides look interesting.

Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table rewrites vs. pending AFTER triggers

2008-06-02 Thread Robert Hodges
Hi Gokul,

If you are saying that DDL should be auto-commit, yes, this really does
limit some use cases.

Transactional DDL is quite helpful for SQL generators, which need to avoid
leaving schema half-changed if the application crashes or there¹s a problem
with the database that causes a command to fail.  SLONY is an example of
such a generator where transactional DDL would be helpful though I don¹t
know for a fact that SLONY uses it.  We have used it in the past for
building queues in SQL, which required multiple schema changes for a single
queue.

In sum, it¹s much easier to implement such tools if you can do a set of
schema changes atomically.  There are no doubt other use cases as well.

Cheers, Robert

On 1/2/08 11:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote:

 Is there why we allow DDLs inside a transaction and allow it to be rolled
 back? If we commit the previous transaction, as soon as we encounter a DDL,
 and commit the DDL too (without waiting for commit) will it be affecting some
 use cases?

 I actually mean to say that DDLs can be declared as self-committing. That
 would get rid of these exceptions.

 Am i missing something?

 Thanks,
 Gokul.

 On Jan 3, 2008 12:02 AM, Andrew Dunstan  [EMAIL PROTECTED] wrote:


 Simon Riggs wrote:
 On Tue, 2008-01-01 at 16:09 -0500, Tom Lane wrote:


 Paranoia would
 suggest forbidding *any* form of ALTER TABLE when there are pending
 trigger events, but maybe that's unnecessarily strong.


 That works for me. Such a combination makes no sense, so banning it is
 the right thing to do.



 +1. Doesn't make much sense to me either.

 cheers

 andrew

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org
 http://archives.postgresql.org





--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add dblink function to check if a named connection exists

2008-06-02 Thread Asko Oja
Just use plproxy and skip all the hassle of dblink :)

On Mon, Jun 2, 2008 at 3:14 AM, Joe Conway [EMAIL PROTECTED] wrote:

 Tom Lane wrote:

 Tommy Gildseth [EMAIL PROTECTED] writes:

 One obvious disadvantage of this approach, is that I need to connect and
 disconnect in every function. A possible solution to this, would be having a
 function f.ex dblink_exists('connection_name') that returns true/false
 depending on whether the  connection already exists.


 Can't you do this already?

SELECT 'myconn' = ANY (dblink_get_connections());

 A dedicated function might be a tad faster, but it probably isn't going
 to matter compared to the overhead of sending a remote query.


 I agree. The above is about as simple as
  SELECT dblink_exists('dtest1');
 and probably not measurably slower. If you still think a dedicated function
 is needed, please send the output of some performance testing to justify it.

 If you really want the notational simplicity, you could use an SQL function
 to wrap it:

 CREATE OR REPLACE FUNCTION dblink_exists(text)
 RETURNS bool AS $$
  SELECT $1 = ANY (dblink_get_connections())
 $$ LANGUAGE sql;

 contrib_regression=# SELECT dblink_exists('dtest1');
  dblink_exists
 ---
  f
 (1 row)

 I guess it might be worthwhile adding the SQL function definition to
 dblink.sql.in as an enhancement in 8.4.

 Joe



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Dave Page
On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 I've seen people not doing so more often
 than you would think. Perhaps because they are DBAs and not sysadmins? I
 also
 meant a tool to do things like verify that the changes are valid, as
 someone
 else mentioned elsewhere in this thread.

 pg_ctl -D data check?

 I would +1 that.

I would also really like to see that - though I'd also like to see an
SQL interface so we can check a config before saving when editing via
pgAdmin or similar.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-06-02 Thread Dawid Kuroczko
On Wed, May 7, 2008 at 7:52 AM, KaiGai Kohei [EMAIL PROTECTED] wrote:
 Tom, Thanks for your reviewing.
 The patch hasn't got a mode in which SELinux support is compiled in but
 not active.  This is a good way to ensure that no one will ever ship
 standard RPMs with the feature compiled in, because they will be entirely
 nonfunctional for people who aren't interested in setting up SELinux.
 I think you need an enable_sepostgres GUC, or something like that.
 (Of course, the overhead of the per-row security column would probably
 discourage anyone from wanting to use such a configuration anyway,
 so maybe the point is moot.)
 We can turn on/off SELinux globally, not bounded to SE-PostgreSQL.
 The reason why I didn't provide a mode bit like enable_sepostgresql
 is to keep consistency in system configuration.

Hmm, I think ACE should be a CREATE DATABASE parameter.

If I were to create a SE-database I would wish that disabling it was
more difficult than changing a GUC in database.  And being able to
set it on per-database basis would help get SE/ACE enabled by
packagers.

   Regards,
  Dawid
-- 
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Sun, 1 Jun 2008, Peter Eisentraut wrote:


Josh Berkus wrote:

1. Most people have no idea how to set these.

Could you clarify this?  I can't really believe that people are incapable of
editing a configuration file.


The big problem isn't the editing, it's knowing what to set the 
configuration values to.


This is not to say that editing a configuration file should be considered 
reasonable.  Any GUCS overhaul should include a design goal of being able 
to completely manage the configuration system using, say, pgadmin (the 
manage settings via port access part that Josh already mentioned). 
This is why I was suggesting additions aimed at assimilating all the 
things that are in the postgresql.conf file.


Joshua has been banging a drum for a while now that all this data needs to 
get pushing into the database itself.  The GUCS data is clearly structured 
like a database table.  Josh's suggested changes are basically adding all 
the columns needed to it in order to handle everything you'd want to do to 
the table.  If you think of it in those terms and make it possible to 
manipulate that data using the tools already available for updating 
tables, you'll open up the potential to add a whole new class of 
user-friendly applications for making configuration easier to manage.


However, I don't fully agree with taking that idea as far as Joshua has 
suggested (only having the config data in the database), because having 
everything in a simple text file that can be managed with SCM etc. has 
significant value.  It's nice to allow admins to be able to make simple 
changes with just a file edit.  It's nice that you can look at all the 
parameters in one place and browse them.  However, I do think that the 
internal database representation must be capable of holding everything in 
the original postgresql.conf file and producing an updated version of the 
file, either locally or remotely, as needed.



4. We don't seem to be getting any closer to autotuning.

True.  But how does your proposal address this?


The idea that Josh's suggestions are working toward is simplying the 
construction of tools that operate on the server configuration file, so 
that it's easier to write an autotuning tool.  Right now, writing such a 
tool in a generic way gets so bogged down just in parsing/manipulating the 
postgresql.conf file that it's hard to focus on actually doing the tuning 
part.  If we go back to his original suggestion:

http://wiki.postgresql.org/wiki/GUCS_Overhaul
Add a script called pg_generate_conf to generate a postgresql.conf 
based on guc.c and command-line switches (rather than 
postgresql.conf.sample)


It's an easy jump from there to imagine a pg_generate_conf that provide a 
wizard interface to update a configuration file.  I forsee a little GUI 
or web app that connects to a server on port 5432, finds out some basic 
information about the server, and gives something like this:


Parameter   Current Recommended Change?
shared_buffers  32MB1024MB  [X]
effective_cache_size128MB   3000MB  [ ]
work_mem1MB 16MB[ ]

Josh has the actual brains behind such an app all planned out if you look 
at his presentations, but without the larger overhaul it's just not 
possible to make the implementation elegant.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [ANNOUNCE] == PostgreSQL Weekly News - June 01 2008 ==

2008-06-02 Thread Kaare Rasmussen
David Fetter writes: 


Alter the xxx_pattern_ops opclasses to use the regular equality
operator of the associated datatype as their equality member.  This
means that these opclasses can now support plain equality
comparisons along with LIKE tests, thus avoiding the need for an
extra index in some applications. 


This is interesting, as it will save me a couple of very big indexes. 

I didn't see that coming, but thanks. 



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Sun, 1 Jun 2008, Joshua D. Drake wrote:

Well I don't know that a minimum of comments is what I am arguing as 
much as not too much comments.


Josh's proposal included making three levels of documentation-level 
comments available:  terse, normal, and verbose.  The verbose comment 
level probably should include a web link to full documentation.  The way 
the comments litter the existing file, the status quo that's called normal 
mode in this proposal, is IMHO a complete mess.  Most use cases I can 
think of want either no comments or really verbose ones, the limited 
commentary in the current sample postgresql.conf seems a middle ground 
that's not right for anybody.


The key thing thing here in my mind is that it should be possible to 
transform between those three different verbosity levels without losing 
any settings or user-added comments.  They're really just different views 
on the same data, and which view you're seeing should be easy to change 
without touching the data.


I just extracted the original design proposal and some of the relevent 
follow-up in this thread, made some additional suggestions, and put the 
result at http://wiki.postgresql.org/wiki/GUCS_Overhaul I think reading 
that version makes it a bit clearer what the proposed overhaul is aiming 
for.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Stephen R. van den Berg
Greg Smith wrote:
Joshua has been banging a drum for a while now that all this data needs to 
get pushing into the database itself.  The GUCS data is clearly structured 

tables, you'll open up the potential to add a whole new class of 
user-friendly applications for making configuration easier to manage.

However, I don't fully agree with taking that idea as far as Joshua has 
suggested (only having the config data in the database), because having 
everything in a simple text file that can be managed with SCM etc. has 
significant value.  It's nice to allow admins to be able to make simple 
changes with just a file edit.  It's nice that you can look at all the 
parameters in one place and browse them.  However, I do think that the 
internal database representation must be capable of holding everything in 
the original postgresql.conf file and producing an updated version of the 
file, either locally or remotely, as needed.

Depending on the complexity you want to have inside the generator, one
could imagine a middle ground solution like:

include database-generated.conf
include local-overrides.conf

Where the database-generated.conf does not necessarily needs a lot of
comments.

Josh has the actual brains behind such an app all planned out if you look 
at his presentations, but without the larger overhaul it's just not 
possible to make the implementation elegant.

IMHO Greg's response is the most comprehensive and well-thought-through
contribution in the whole GUC thread.
-- 
Sincerely,
   Stephen R. van den Berg.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Andrew Sullivan
On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote:

 What locale is right? If I have a Web app, there could be data in many 
 different languages in a single table/column.

I think the above amounts to a need for per-session locale settings or
something, no?

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table rewrites vs. pending AFTER triggers

2008-06-02 Thread Andrew Sullivan
On Sun, Jun 01, 2008 at 11:51:29PM -0700, Robert Hodges wrote:

 If you are saying that DDL should be auto-commit, yes, this really does
 limit some use cases.

I agree.  Transactional DDL is a big feature I'd hate to see go away.
Oracle DBAs I know look with envy on this feature of Postgres.
 
 with the database that causes a command to fail.  SLONY is an example of
 such a generator where transactional DDL would be helpful though I don¹t
 know for a fact that SLONY uses it.  

It sort of does, in that all work under Slony is performed in a
transaction.  But Slony attempts to isolate the DDL in a sync at just
the right point, and not mix schema and data changes in syncs.

Still, one of the important best practices in preparing your DDL
changing scripts for Slony is to try running that script inside a
transaction (and then rolling back) on every node, to ensure that it
will in fact work on every node.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Teodor Sigaev

I have attached a new patch with respect to the current cvs head. This
produces headline in a document for a given query. Basically it
identifies fragments of text that contain the query and displays them.

New variant is much better, but...


 HeadlineParsedText contains an array of  actual words but not
information about the norms. We need an indexed position vector for each
norm so that we can quickly evaluate a number of possible fragments.
Something that tsvector provides.


Why do you need to store norms? The single purpose of norms is identifying words 
from query - but it's already done by hlfinditem. It sets 
HeadlineWordEntry-item to corresponding QueryOperand in tsquery.
Look, headline function is rather expensive and your patch adds a lot of extra 
work  - at least in memory usage. And if user calls with NumFragments=0 the that 
work is unneeded.



This approach does not change any other interface and fits nicely with
the overall framework.
Yeah, it's a really big step forward. Thank you. You are very close to 
committing except: Did you find a hlCover() function which produce a cover from 
original HeadlineParsedText representation? Is any reason to do not use it?




The norms are converted into tsvector and a number of covers are
generated. The best covers are then chosen to be in the headline. The
covers are separated using a hardcoded coversep. Let me know if you want
to expose this as an option.





Covers that overlap with already chosen covers are excluded.

Some options like ShortWord and MinWords are not taken care of right
now. MaxWords are used as maxcoversize. Let me know if you would like to
see other options for fragment generation as well.
ShortWord, MinWords and MaxWords should store their meaning, but for each 
fragment, not for the whole headline.





Let me know any more changes you would like to see.


if (num_fragments == 0)
/* call the default headline generator */
mark_hl_words(prs, query, highlight, shortword, min_words, 
max_words);
else
mark_hl_fragments(prs, query, highlight, num_fragments, max_words);


Suppose, num_fragments  2?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 Joshua has been banging a drum for a while now that all this data needs to 
 get pushing into the database itself.

This is, very simply, not going to happen.  Shall we go over the reasons
why not, one more time?

1. Quite a few of the GUC parameters are needed *before* one can ever
read the database; in particular the ones about file locations and
shared memory sizing.

2. Suppose you change a parameter in a way that breaks the DB (eg,
set shared_buffers to a value larger than your kernel allows) and
try to restart.  Database doesn't start.  If the parameter can only
be changed back within an operating database, you're hosed.

I have no objection to providing alternative ways to edit the
configuration data, but the primary source of the settings is
going to continue to be an editable text file.  Any proposals for
alternatives-to-a-text-editor have to work within that reality.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Stephen R. van den Berg
Tom Lane wrote:
Greg Smith [EMAIL PROTECTED] writes:
 Joshua has been banging a drum for a while now that all this data needs to 
 get pushing into the database itself.

This is, very simply, not going to happen.  Shall we go over the reasons
why not, one more time?

1. Quite a few of the GUC parameters are needed *before* one can ever
read the database; in particular the ones about file locations and
shared memory sizing.

Obviously.

2. Suppose you change a parameter in a way that breaks the DB (eg,
set shared_buffers to a value larger than your kernel allows) and
try to restart.  Database doesn't start.  If the parameter can only
be changed back within an operating database, you're hosed.

That's why those parameters will always have to be read from a textfile.
Which doesn't prohibit that textfile to be generated from within the
database (once up and running).  And yes, if someone changes the
parameter through the DB, then makes the DB write out the configfile,
then restarts the DB remotely, and the change didn't work out (i.e. DB
doesn't start, or crashes/hangs), he is hosed and needs to log in and change
the textfile with a texteditor, no way around it.

I have no objection to providing alternative ways to edit the
configuration data, but the primary source of the settings is
going to continue to be an editable text file.  Any proposals for
alternatives-to-a-text-editor have to work within that reality.

I think everyone is aware of that.
-- 
Sincerely,
   Stephen R. van den Berg.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread David E. Wheeler

On Jun 2, 2008, at 06:51, Andrew Sullivan wrote:


On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote:

What locale is right? If I have a Web app, there could be data in  
many

different languages in a single table/column.


I think the above amounts to a need for per-session locale settings or
something, no?


Yes, that's what I was getting at.

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] phrase search

2008-06-02 Thread Teodor Sigaev




I have attached a patch for phrase search with respect to the cvs head.
Basically it takes a a phrase (text) and a TSVector. It checks if the
relative positions of lexeme in the phrase are same as in their
positions in TSVector.


Ideally, phrase search should be implemented as new operator in tsquery, say # 
with optional distance. So, tsquery 'foo #2 bar' means: find all texts where 
'bar' is place no far than two word from 'foo'. The complexity is about complex 
boolean expressions ( 'foo #1 ( bar1  bar2 )' ) and about several languages as 
norwegian or german. German language has combining words, like a footboolbar  - 
 and they have several variants of splitting, so result of to_tsquery('foo # 
footboolbar') will be a 'foo # ( ( football  bar ) | ( foot  ball  bar ) )'

where variants are connected with OR operation.

Of course, phrase search should be able to use indexes.


If the configuration for text search is simple, then this will produce
exact phrase search. Otherwise the stopwords in a phrase will be ignored
and the words in a phrase will only be matched with the stemmed lexeme.


Your solution can't be used as is, because user should use tsquery too to use an 
index:


column @@ to_tsquery('phrase search') AND  is_phrase_present('phrase search', 
column)


First clause will be used for index scan and it will fast search a candidates.


For my application I am using this as a separate shared object. I do not
know how to expose this function from the core. Can someone explain how
to do this?


Look at contrib/ directory in pgsql's source code - make a contrib module from 
your patch. As an example, look at adminpack module - it's rather simple.


Comments of your code:
1)
+#ifdef PG_MODULE_MAGIC
+PG_MODULE_MAGIC;
+#endif

That isn't needed for compiled-in in core files, it's only needed for modules.

2)
 use only /**/ comments, do not use a // (C++ style) comments
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: new function array_init

2008-06-02 Thread Pavel Stehule
Hello

There was more time questions about array's initialisation. I propose
function array_init.

CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement)
RETURNS anyarray;

First parameter is array of dimension's sizes. Second argument is
value that will be used for initialisation.

Because pg array's indexes must not start from 1 we can allow specify it.

CREATE OR REPLACE FUNCTION array_init(sizes int[], lowers int[], v
anyelement) RETURNS anyarray;



select array_init(array[2],0);
 array
---
 {0,0}
(1 row)


select array_init(array[1,2], 0);
 array
---
 {{0,0},{0,0}}
(1 row)

Any comments?

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Jignesh K. Shah



Simon Riggs wrote:


Some other problems I see with GUCs

* It's not possible to set one parameter depending upon the setting of
another.
  


To me this is more critical.. Most people I have seen will increase one 
or few but not all parameters related to memory which can result in loss 
of performance and productivity in figuring out.


What happened to AvailRAM setting and base all memory gucs on that.  
Ideally PostgreSQL should only create one big memory pool and allow all 
other variables to change runtime via dba or some tuner process or 
customized application as long as total is less than the allocated 
shared_memory and local_memory settings. (This will also reduce the need 
of restarting Postgres if a value needs to be changed)




-Jignesh


* It's always unclear which GUCs can be changed, and when. That is much
more infrequently understood than the meaning of them.

* We should rename effective_cache_size to something that doesn't sound
like it does what shared_buffers does

* There is no config verification utility, so if you make a change and
then try to restart and it won't, you are in trouble.

  


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread David E. Wheeler

On Jun 1, 2008, at 21:08, Tom Lane wrote:

1. Does the use of the tolower() C function in the citext data type  
on

pgfoundry basically give me the same results as using lower() in my
SQL has for all these years?


[ broken record... ]  Kinda depends on your locale.  However,  
tolower()

is 100% guaranteed not to work for multibyte encodings, so citext is
quite useless if you're using UTF8.  This is fixable, no doubt, but
it's not fixed in the project as it stands.


Would the use of str_tolower() in formatting.c fix that?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Mon, 2 Jun 2008, Jignesh K. Shah wrote:

Most people I have seen will increase one or few but not all parameters 
related to memory which can result in loss of performance and 
productivity in figuring out.


If it becomes easier to build a simple tool available to help people tune 
their configurations, that should help here without having to do anything 
more complicated than that.



What happened to AvailRAM setting and base all memory gucs on that.


Like some of the other GUC simplification ideas that show up sometimes 
(unifying all I/O and limiting background processes based on that total is 
another), this is hard to do internally.  Josh's proposal has a fair 
amount of work involved, but the code itself doesn't need to be clever or 
too intrusive.  Unifying all the memory settings would require being both 
clever and intrusive, and I doubt you'll find anybody who could pull it 
off who isn't already overtasked with more important improvements for the 
8.4 timeframe.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes:
 On Jun 1, 2008, at 21:08, Tom Lane wrote:
 [ broken record... ]  Kinda depends on your locale.  However,  
 tolower()
 is 100% guaranteed not to work for multibyte encodings, so citext is
 quite useless if you're using UTF8.  This is fixable, no doubt, but
 it's not fixed in the project as it stands.

 Would the use of str_tolower() in formatting.c fix that?

Yeah, you need something equivalent to that.  I think that whole area
is due for refactoring, though --- we've got kind of a weird collection
of upper/lower/initcap APIs spread through a couple of different files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Greg Smith

On Mon, 2 Jun 2008, Tom Lane wrote:


Greg Smith [EMAIL PROTECTED] writes:

Joshua has been banging a drum for a while now that all this data needs to
get pushing into the database itself.


This is, very simply, not going to happen.


Right, there are also technical challenges in the way of that ideal.  I 
was only mentioning the reasons why it might not be the best idea even if 
it were feasible.  However, I do not see why the limitations you bring up 
must get in the way of thinking about how to interact and manage the 
configuration data in a database context, even though it ultimately must 
be imported and exported to a flat file.


The concerns you bring up again about leaving the database in an 
unstartable state are a particularly real danger in the only has access 
to 5432 hosted provider case that this redesign is trying to satisfy.  I 
added a Gotchas section to the wiki page so that this issue doesn't get 
forgotten about.  The standard way to handle this situation is to have a 
known good backup configuration floating around.  Adding something in that 
area may end up being a hard requirement before remote editing makes 
sense.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: new function array_init

2008-06-02 Thread Tom Lane
Pavel Stehule [EMAIL PROTECTED] writes:
 There was more time questions about array's initialisation. I propose
 function array_init.

 CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement)
 RETURNS anyarray;

I think this is basically a good idea, but maybe the API needs a bit of
adjustment --- providing the sizes as an array doesn't seem especially
convenient.  Since we only allow up to 6 dimensions (IIRC), what about
six functions with different numbers of parameters:

array_int(int, anyelement)
array_int(int, int, anyelement)
...
array_int(int, int, int, int, int, int, anyelement)

I don't object to having the array-input version too, but seems like in
most cases this way would be easier to use.  It wouldn't work well
for providing lower bounds too, but maybe the array-input case is
sufficient for that.

Other thoughts:

* Should the fill value be the first parameter instead of the last?
I'm not sure either way.

* I have a mild preference for array_fill instead of array_init.

* We can handle a null fill value now, but what about nulls in the
dimensions?  The alternatives seem to be to return a null array
(not an array of nulls) or throw error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Chris Browne
[EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes:
 On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

 Well, yes, but you do know about archive_timeout, right? No need to wait 
 2 hours.

 Then you ship 16 MB binary stuff every 30 second or every minute but
 you only have some kbyte real data in the logfile. This must be taken
 into account, especially if you ship the logfile over the internet
 (means: no high-speed connection, maybe even pay-per-traffic) to the
 slave.

If you have that kind of scenario, then you have painted yourself into
a corner, and there isn't anything that can be done to extract you
from it.

Consider: If you have so much update traffic that it is too much to
replicate via WAL-copying, why should we expect that other mechanisms
*wouldn't* also overflow the connection?

If you haven't got enough network bandwidth to use this feature, then
nobody is requiring that you use it.  It seems like a perfectly
reasonable prerequisite to say this requires that you have enough
bandwidth.
-- 
(reverse (concatenate 'string ofni.secnanifxunil @ enworbbc))
http://www3.sympatico.ca/cbbrowne/
There's nothing worse than having only one drunk head.
-- Zaphod Beeblebrox

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Simon Riggs

On Mon, 2008-06-02 at 11:59 -0400, Jignesh K. Shah wrote:
 
 Simon Riggs wrote:
 
  Some other problems I see with GUCs
 
  * It's not possible to set one parameter depending upon the setting of
  another.

 
 To me this is more critical.. Most people I have seen will increase one 
 or few but not all parameters related to memory which can result in loss 
 of performance and productivity in figuring out.
 
 What happened to AvailRAM setting and base all memory gucs on that.  
 Ideally PostgreSQL should only create one big memory pool and allow all 
 other variables to change runtime via dba or some tuner process or 
 customized application as long as total is less than the allocated 
 shared_memory and local_memory settings. (This will also reduce the need 
 of restarting Postgres if a value needs to be changed)

Agreed.

Right now, we can't even do that in code, let alone in config file.

If we had a smart_memory_config = on then we'd be able to say in the
backend:
if (smart_memory_config)
{
other_thing = 0.1 * Nbuffers;
}

but the GUCs are evaluated in alphabetical order, without any way of
putting dependencies between them. So they are notionally orthogonal.   

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread David E. Wheeler

On Jun 2, 2008, at 09:33, Tom Lane wrote:


Would the use of str_tolower() in formatting.c fix that?


Yeah, you need something equivalent to that.  I think that whole area
is due for refactoring, though --- we've got kind of a weird  
collection
of upper/lower/initcap APIs spread through a couple of different  
files.


And I just ran into this on 8.3 when trying to install citext:

  psql:citext.sql:350: ERROR:  there is no built-in function named  
oid_text


I'm assuming that this is because a lot of automatic casts were  
removed in 8.3 or 8.2; There are a bunch of these:


CREATE FUNCTION citext(oid) RETURNS citext AS 'oid_text'  LANGUAGE  
'internal' IMMUTABLE STRICT;

GRANT EXECUTE ON FUNCTION citext(oid) TO PUBLIC;
COMMENT ON FUNCTION citext(oid) IS 'convert oid to citext';
CREATE FUNCTION oid(citext) RETURNS oid AS 'text_oid'  LANGUAGE  
'internal' IMMUTABLE STRICT;

GRANT EXECUTE ON FUNCTION oid(citext) TO PUBLIC;
COMMENT ON FUNCTION oid(citext) IS 'convert citext to oid';
CREATE CAST (citext AS oid) WITH FUNCTION oid(citext);
CREATE CAST (oid AS citext) WITH FUNCTION citext(oid);

CREATE FUNCTION citext(int2) RETURNS citext AS 'int2_text'  LANGUAGE  
'internal' IMMUTABLE STRICT;

GRANT EXECUTE ON FUNCTION citext(int2) TO PUBLIC;
COMMENT ON FUNCTION citext(int2) IS 'convert int2 to citext';
CREATE FUNCTION int2(citext) RETURNS int2 AS 'text_int2'  LANGUAGE  
'internal' IMMUTABLE STRICT;

GRANT EXECUTE ON FUNCTION int2(citext) TO PUBLIC;
COMMENT ON FUNCTION int2(citext) IS 'convert citext to int2';
CREATE CAST (citext AS int2) WITH FUNCTION int2(citext);
CREATE CAST (int2 AS citext) WITH FUNCTION citext(int2);

And on and on. Clearly this module needs updating for newer  
PostgreSQLs. I tried removing them all in order to get the data type  
and tried it out with this script:


my $dbh = DBI-connect('dbi:Pg:dbname=try', 'postgres', '',  
{ pg_enable_utf8 = 1 });

for my $char qw( À Á Â Ã Ä Å Ç Ć Č Ĉ Ċ Ď Đ A B C D ) {
print $char: , $dbh-selectrow_array('SELECT LOWER(?::citext)',  
undef, $char ), $/;

}

Naturally it didn't work:

À: Ã
Á: á
Â: â
Ã: ã
Ä: ä
Å: Ã¥
Ç: ç
Ć: ć
Č: č
Ĉ: ĉ
Ċ: ċ
Ď: ď
Đ: đ
A: a
B: b
C: c
D: d

BTW, I rebuilt my cluster with --locale en_US.UTF-8 and the script  
works on a text type, so having a locale is key.


Thanks,

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Jeff Davis
On Mon, 2008-06-02 at 09:51 -0400, Andrew Sullivan wrote:
 On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote:
 
  What locale is right? If I have a Web app, there could be data in many 
  different languages in a single table/column.
 
 I think the above amounts to a need for per-session locale settings or
 something, no?
 

What if you had a CHECK constraint that was locale-sensitive? Would the
constraint only be non-false (true or null) for records inserted under
the same locale? That's not very useful.

I think if you want some special treatment of text for some users, it
should be explicit. Text in one locale is really a different type from
text in another locale, and so changing the locale of some text variable
is really a typecast. I don't think GUCs are the correct mechanism for
this.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Hannu Krosing
On Thu, 2008-05-29 at 23:37 +0200, Mathias Brossard wrote:

 I pointed out that the NTT solution is synchronous because Tom said in 
 the first part of his email that:
 
   In practice, simple asynchronous single-master-multiple-slave
   replication covers a respectable fraction of use cases, so we have
   concluded that we should allow such a feature to be included in the
   core project.
 
 ... and yet the most appropriate base technology for this is 
 synchronous and maybe I should have also pointed out in my previous mail 
 is that it doesn't support multiple slaves.

I don't think that you need too many slaves in sync mode.

Probably 1-st slave sync and others async from there on will be good
enough.

 Also, as other have pointed out there are different interpretations of 
 synchronous depending on wether the WAL data has reached the other end 
 of the network connection, a safe disk checkpoint or the slave DB itself.

Probably all DRBD-s levels ( A) data sent to network, B) data received,
C) data written to disk) should be supported + C1) data replayed in
slave DB. C1 meaning that it can be done in parallel with C)

Then each DBA can set it up depending on what he trusts - network,
slave's power supply or slaves' disk.

Also, the case of slave failure should be addressed. I don't think that
the best solution is halting all ops on master if slave/network fails.

Maybe we should allow also a setup with 2-3 slaves, where operations can
continue when at least 1 slave is syncing ?

--
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: new function array_init

2008-06-02 Thread Webb Sprague
On Mon, Jun 2, 2008 at 9:46 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Pavel Stehule [EMAIL PROTECTED] writes:
 There was more time questions about array's initialisation. I propose
 function array_init.

As one of the questioners, I will give some short thoughts below.

 CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement)
 RETURNS anyarray;

+1.  +0 for Pavel's proposed syntax, because it feels better, but also
it scales to N dimensions (we should throw an error obviously if the
input is too big, but we can probably source that size through an
include), I hate functions with more than four arguments, and having
six slightly overloaded functions in the system catalogs seems
annoying.

 * We can handle a null fill value now, but what about nulls in the
 dimensions?  The alternatives seem to be to return a null array
 (not an array of nulls) or throw error.

I would throw an error, unless there is something that one can do with
a null array (perhaps there is?).

We also might want to consider a resize function, and some other
utilities as long as we are bothering with this.

I am sorry that I can't offer to write these, but I don't have the
time to learn the Postgresql infrastructure to do it.

Thanks for the attention Pavel!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add dblink function to check if a named connection exists

2008-06-02 Thread Tommy Gildseth

Joe Conway wrote:



If you really want the notational simplicity, you could use an SQL 
function to wrap it:


CREATE OR REPLACE FUNCTION dblink_exists(text)
RETURNS bool AS $$
  SELECT $1 = ANY (dblink_get_connections())
$$ LANGUAGE sql;



Thanks, that seems like a reasonable way to solve this.


--
Tommy Gildseth

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Martijn van Oosterhout
On Mon, Jun 02, 2008 at 10:29:30AM -0700, Jeff Davis wrote:
 I think if you want some special treatment of text for some users, it
 should be explicit. Text in one locale is really a different type from
 text in another locale, and so changing the locale of some text variable
 is really a typecast. I don't think GUCs are the correct mechanism for
 this.

The SQL COLLATE syntax handles this just fine. Either the original
COLLATE patch or the new one will let people tags strings with any
collation they like.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Jeff Davis
On Mon, 2008-06-02 at 19:55 +0200, Martijn van Oosterhout wrote:
 The SQL COLLATE syntax handles this just fine. Either the original
 COLLATE patch or the new one will let people tags strings with any
 collation they like.

http://wiki.postgresql.org/wiki/Todo:Collate

The last reference I see on that page is from 2005. Is there any updated
information? Are there any major obstacles holding this up aside from
the platform issues mentioned on that page?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-02 Thread Josh Berkus
Greg,

 Like some of the other GUC simplification ideas that show up sometimes
 (unifying all I/O and limiting background processes based on that total
 is another), this is hard to do internally.  Josh's proposal has a fair
 amount of work involved, but the code itself doesn't need to be clever
 or too intrusive.  Unifying all the memory settings would require being
 both clever and intrusive, and I doubt you'll find anybody who could
 pull it off who isn't already overtasked with more important
 improvements for the 8.4 timeframe.

Plus, I'm a big fan of enable an API rather than write a feature.  I 
think that there are people  companies out there who can write better 
autotuning tools than I can, and I'd rather give them a place to plug 
those tools in than trying to write autotuning into the postmaster.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Jeff Davis
On Sun, 2008-06-01 at 22:13 -0700, David E. Wheeler wrote:
 What locale is right? If I have a Web app, there could be data in many  
 different languages in a single table/column.

I think the values should be explicitly treated differently. 

It would be nice if you could just typecast, like:
lower(somevalue::text(fr_CA))

which would then lowercase according to the fr_CA locale, regardless of
the locale of somevalue.

Using typmod for localization was brought up here:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00635.php

Has it been discussed further? I happen to like the idea of the TEXT
type taking a locale as a typmod. No typmod would, of course, fall back
to the cluster setting. And it would throw an exception if the encoding
couldn't represent that locale.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Andrew Sullivan
On Mon, Jun 02, 2008 at 10:29:30AM -0700, Jeff Davis wrote:

 What if you had a CHECK constraint that was locale-sensitive? Would the
 constraint only be non-false (true or null) for records inserted under
 the same locale? That's not very useful.

It would seem that this is one of the important cases that needs to be
worked out.  I wasn't suggesting that per-session locale (or whatever
we want to call it) is _easy_ or, for that matter, even possible; just
that it would solve a large number of the problems that people
complain about.

In fact, I suspect that what we really need is something a little more
like in-database locale or something.
 
 I think if you want some special treatment of text for some users, it
 should be explicit. 

Yes.  Also, not just text.  Think of currency, numeric separators, c.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the doxyfile?

2008-06-02 Thread Stefan Kaltenbrunner
Zdenek Kotala wrote:
 Xin Wang napsal(a):
 Hi,
 I don't know where I can find the doxyfile which generate
 doxygen.postgresql.org web site. I found that when reading code the
 doxygen source code is quite helpful. However, I want to generate an
 off-line copy of doxygen docs myself, but I can't find the doxyfile in
 the lastest source release.
 
 I think it is good idea. Stefan, what's about put it on the wiki?

don't think the wiki is a good place (we would have to maintain it in
addition to the main file) so I simply added a link on
http:/doxygen.postgresql.org that contains the current copy of the
configuration file that was used to generate a particular set of docs.
However some of the things there are specific to our install so a bit
(like some of the paths) of manual change will be required anyway.


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Tino Wildenhain

Andrew Sullivan wrote:
...

I think if you want some special treatment of text for some users, it
should be explicit. 


Yes.  Also, not just text.  Think of currency, numeric separators, c.


Which imho, should not really  be the business of the type interface
but instead something to_char() and to_{type} handles.

Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] Proposal: new function array_init

2008-06-02 Thread Pavel Stehule
2008/6/2 Tom Lane [EMAIL PROTECTED]:
 Pavel Stehule [EMAIL PROTECTED] writes:
 There was more time questions about array's initialisation. I propose
 function array_init.

 CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement)
 RETURNS anyarray;

 I think this is basically a good idea, but maybe the API needs a bit of
 adjustment --- providing the sizes as an array doesn't seem especially
 convenient.  Since we only allow up to 6 dimensions (IIRC), what about
 six functions with different numbers of parameters:

array_int(int, anyelement)
array_int(int, int, anyelement)
...
array_int(int, int, int, int, int, int, anyelement)

 I don't object to having the array-input version too, but seems like in
 most cases this way would be easier to use.  It wouldn't work well
 for providing lower bounds too, but maybe the array-input case is
 sufficient for that.

Your proposal is maybe little bit readable with lower bounds, and when
initial value is integer. But it's easy do wrap SQL functions .


 Other thoughts:

 * Should the fill value be the first parameter instead of the last?
 I'm not sure either way.

I am not sure too. I have not any original - the nearest function is memset?


 * I have a mild preference for array_fill instead of array_init.

maybe, maybe array_set. Any ideas are welcome


 * We can handle a null fill value now, but what about nulls in the
 dimensions?  The alternatives seem to be to return a null array
 (not an array of nulls) or throw error.

I am afraid so null array can be changed with null value - so I prefer
in this case raise exception.

Regards
Pavel Stehule

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: new function array_init

2008-06-02 Thread James William Pye
On Mon, Jun 02, 2008 at 08:10:19PM +0200, Pavel Stehule wrote:
  * I have a mild preference for array_fill instead of array_init.
 
 maybe, maybe array_set. Any ideas are welcome

array_create?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Shane Ambler

To diverge a little -


Bit of a nood question along these lines -

Does LIKE and ILIKE take into consideration the locale allowing 
insensitive searches in any locale setting?



I know that LIKE can use an index if you don't start the match with a 
wild card. ILIKE doesn't seem to. Is or would it be possible to get 
ILIKE to use a properly configured index as well?




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Andreas 'ads' Scherbaum
On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote:

 [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait 
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.
 
 If you have that kind of scenario, then you have painted yourself into
 a corner, and there isn't anything that can be done to extract you
 from it.

You are misunderstanding something. It's perfectly possible that you
have a low-traffic database with changes every now and then. But you
have to copy a full 16 MB logfile every 30 seconds or every minute just
to have the slave up-to-date.


 Consider: If you have so much update traffic that it is too much to
 replicate via WAL-copying, why should we expect that other mechanisms
 *wouldn't* also overflow the connection?

For some MB real data you copy several GB logfiles per day - that's a
lot overhead, isn't it?


 If you haven't got enough network bandwidth to use this feature, then
 nobody is requiring that you use it.  It seems like a perfectly
 reasonable prerequisite to say this requires that you have enough
 bandwidth.

If you have a high-traffic database, then of course you need an other
connection as if you only have a low-traffic or a mostly read-only
database. But that's not the point. Copying an almost unused 16 MB WAL
logfile is just overhead - especially because the logfile is not
compressable very much because of all the leftovers from earlier use.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] rfc: add pg_dump options to dump output

2008-06-02 Thread Robert Treat
I would like to add the flags given to pg_dump into the output of the pg_dump 
file. For text dumps, the output would go on a line below the current header, 
so output would look like:

--
-- PostgreSQL database dump complete
--
-- Generated by: pg_dump -s -U rob pagila 
--

For compressed dumps, the output could go into the headers created with -l, 
for example:

; Archive created at Mon Jun  2 16:43:19 2008
; dbname: pgods
; TOC Entries: 3
; Compression: -1
; Dump Version: 1.10-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 8.3.1
; Dumped by pg_dump version: 8.3.1
; Generated by: pg_dump -s -U rob -Fc pagila
;
;
; Selected TOC Entries:


Anyone see any issues with this? Should there be other information taken into 
account? Does this need to be an option itself, or can we just do it in all 
cases? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rfc: add pg_dump options to dump output

2008-06-02 Thread Josh Berkus
Robert,

 Anyone see any issues with this? Should there be other information taken
 into account? Does this need to be an option itself, or can we just do
 it in all cases?

+1 to do it in all cases.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case-Insensitve Text Comparison

2008-06-02 Thread Martijn van Oosterhout
On Mon, Jun 02, 2008 at 11:08:55AM -0700, Jeff Davis wrote:
 http://wiki.postgresql.org/wiki/Todo:Collate
 
 The last reference I see on that page is from 2005. Is there any updated
 information? Are there any major obstacles holding this up aside from
 the platform issues mentioned on that page?

Well, a review of the patch and a bit of work in the optimiser.
However, I think the patch will have bitrotted beyond any use by now.
It touched many of the areas the operator families stuff touched, for
example.

I beleive it is being reimplemented as a GSoc project, that's probably
a better approach. Should probably just delete the page from the wiki
altogether.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] phrase search

2008-06-02 Thread Sushant Sinha
On Mon, 2008-06-02 at 19:39 +0400, Teodor Sigaev wrote:
 
  I have attached a patch for phrase search with respect to the cvs head.
  Basically it takes a a phrase (text) and a TSVector. It checks if the
  relative positions of lexeme in the phrase are same as in their
  positions in TSVector.
 
 Ideally, phrase search should be implemented as new operator in tsquery, say 
 # 
 with optional distance. So, tsquery 'foo #2 bar' means: find all texts where 
 'bar' is place no far than two word from 'foo'. The complexity is about 
 complex 
 boolean expressions ( 'foo #1 ( bar1  bar2 )' ) and about several languages 
 as 
 norwegian or german. German language has combining words, like a footboolbar  
 - 
   and they have several variants of splitting, so result of to_tsquery('foo # 
 footboolbar') will be a 'foo # ( ( football  bar ) | ( foot  ball  bar ) )'
 where variants are connected with OR operation.

This is far more complicated than I thought.

 Of course, phrase search should be able to use indexes.

I can probably look into how to use index. Any pointers on this?

  
  If the configuration for text search is simple, then this will produce
  exact phrase search. Otherwise the stopwords in a phrase will be ignored
  and the words in a phrase will only be matched with the stemmed lexeme.
 
 Your solution can't be used as is, because user should use tsquery too to use 
 an 
 index:
 
 column @@ to_tsquery('phrase search') AND  is_phrase_present('phrase search', 
 column)
 
 First clause will be used for index scan and it will fast search a candidates.

Yes this is exactly how I am using in my application. Do you think this
will solve a lot of common case or we should try to get phrase search

1. Use index
2. Support arbitrary distance between lexemes
3. Support complex boolean queries

-Sushant. 

 
  For my application I am using this as a separate shared object. I do not
  know how to expose this function from the core. Can someone explain how
  to do this?
 
 Look at contrib/ directory in pgsql's source code - make a contrib module 
 from 
 your patch. As an example, look at adminpack module - it's rather simple.
 
 Comments of your code:
 1)
 +#ifdef PG_MODULE_MAGIC
 +PG_MODULE_MAGIC;
 +#endif
 
 That isn't needed for compiled-in in core files, it's only needed for modules.
 
 2)
   use only /**/ comments, do not use a // (C++ style) comments


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Sushant Sinha
Efficiency: I realized that we do not need to store all norms. We need
to only store store norms that are in the query. So I moved the addition
of norms from addHLParsedLex to hlfinditem. This should add very little
memory overhead to existing headline generation.

If this is still not acceptable for default headline generation, then I
can push it into mark_hl_fragments. But I think any headline marking
function will benefit by having the norms corresponding to the query.

Why we need norms?

hlCover does the exact thing that Cover in tsrank does which is to find
the  cover that contains the query. However hlcover has to go through
words that do not match the query. Cover on the other hand operates on
position indexes for just the query words and so it should be faster. 

The main reason why I would I like it to be fast is that I want to
generate all covers for a given query. Then choose covers with smallest
length as they will be the one that will best explain relation of a
query to a document. Finally stretch those covers to the specified size.

In my understanding, the current headline generation tries to find the
biggest cover for display in the headline. I personally think that such
a cover does not explain the context of a query in a document. We may
differ on this and thats why we may need both options.

Let me know what you think on this patch and I will update the patch to
respect other options like MinWords and ShortWord. 

NumFragments  2:
I wanted people to use the new headline marker if they specify
NumFragments = 1. If they do not specify the NumFragments or put it to
0 then the default marker will be used. This becomes a bit of tricky
parameter so please put in any idea on how to trigger the new marker.

On an another note I found that make_tsvector crashes if it receives a
ParsedText with curwords = 0. Specifically uniqueWORD returns curwords
as 1 even when it gets 0 words. I am not sure if this is the desired
behavior.

-Sushant.


On Mon, 2008-06-02 at 18:10 +0400, Teodor Sigaev wrote:
  I have attached a new patch with respect to the current cvs head. This
  produces headline in a document for a given query. Basically it
  identifies fragments of text that contain the query and displays them.
 New variant is much better, but...
 
   HeadlineParsedText contains an array of  actual words but not
  information about the norms. We need an indexed position vector for each
  norm so that we can quickly evaluate a number of possible fragments.
  Something that tsvector provides.
 
 Why do you need to store norms? The single purpose of norms is identifying 
 words 
 from query - but it's already done by hlfinditem. It sets 
 HeadlineWordEntry-item to corresponding QueryOperand in tsquery.
 Look, headline function is rather expensive and your patch adds a lot of 
 extra 
 work  - at least in memory usage. And if user calls with NumFragments=0 the 
 that 
 work is unneeded.
 
  This approach does not change any other interface and fits nicely with
  the overall framework.
 Yeah, it's a really big step forward. Thank you. You are very close to 
 committing except: Did you find a hlCover() function which produce a cover 
 from 
 original HeadlineParsedText representation? Is any reason to do not use it?
 
  
  The norms are converted into tsvector and a number of covers are
  generated. The best covers are then chosen to be in the headline. The
  covers are separated using a hardcoded coversep. Let me know if you want
  to expose this as an option.
 
 
  
  Covers that overlap with already chosen covers are excluded.
  
  Some options like ShortWord and MinWords are not taken care of right
  now. MaxWords are used as maxcoversize. Let me know if you would like to
  see other options for fragment generation as well.
 ShortWord, MinWords and MaxWords should store their meaning, but for each 
 fragment, not for the whole headline.
 
 
  
  Let me know any more changes you would like to see.
 
  if (num_fragments == 0)
  /* call the default headline generator */
  mark_hl_words(prs, query, highlight, shortword, min_words, 
 max_words);
  else
  mark_hl_fragments(prs, query, highlight, num_fragments, 
 max_words);
 
 
 Suppose, num_fragments  2?
 
Index: src/backend/tsearch/dict.c
===
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/dict.c,v
retrieving revision 1.5
diff -u -r1.5 dict.c
--- src/backend/tsearch/dict.c	25 Mar 2008 22:42:43 -	1.5
+++ src/backend/tsearch/dict.c	30 May 2008 23:20:57 -
@@ -16,6 +16,7 @@
 #include catalog/pg_type.h
 #include tsearch/ts_cache.h
 #include tsearch/ts_utils.h
+#include tsearch/ts_public.h
 #include utils/builtins.h
 
 
Index: src/backend/tsearch/to_tsany.c
===
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/to_tsany.c,v
retrieving revision 1.12

Re: [HACKERS] rfc: add pg_dump options to dump output

2008-06-02 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 I would like to add the flags given to pg_dump into the output of the pg_dump
 file.

Why?  What is the value of this added complication?

 Anyone see any issues with this?

I'm a bit worried about breaking diff-equality of matching dumps, but
mainly I don't see the point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers