Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Hannu, Hi Hannu, On 6/1/08 2:14 PM, Hannu Krosing [EMAIL PROTECTED] wrote: As a consequence, I don¹t see how you can get around doing some sort of row-based replication like all the other databases. Is'nt WAL-base replication some sort of row-based replication ? Yes, in theory. However, there's a big difference between replicating physical WAL records and doing logical replication with SQL statements. Logical replication requires extra information to reconstruct primary keys. (Somebody tell me if this is already in the WAL; I'm learning the code as fast as possible but assuming for now it's not.) Now that people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko Kreen's presentation at pgCon08. Thanks very much for the pointer. The slides look interesting. Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table rewrites vs. pending AFTER triggers
Hi Gokul, If you are saying that DDL should be auto-commit, yes, this really does limit some use cases. Transactional DDL is quite helpful for SQL generators, which need to avoid leaving schema half-changed if the application crashes or there¹s a problem with the database that causes a command to fail. SLONY is an example of such a generator where transactional DDL would be helpful though I don¹t know for a fact that SLONY uses it. We have used it in the past for building queues in SQL, which required multiple schema changes for a single queue. In sum, it¹s much easier to implement such tools if you can do a set of schema changes atomically. There are no doubt other use cases as well. Cheers, Robert On 1/2/08 11:04 PM, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: Is there why we allow DDLs inside a transaction and allow it to be rolled back? If we commit the previous transaction, as soon as we encounter a DDL, and commit the DDL too (without waiting for commit) will it be affecting some use cases? I actually mean to say that DDLs can be declared as self-committing. That would get rid of these exceptions. Am i missing something? Thanks, Gokul. On Jan 3, 2008 12:02 AM, Andrew Dunstan [EMAIL PROTECTED] wrote: Simon Riggs wrote: On Tue, 2008-01-01 at 16:09 -0500, Tom Lane wrote: Paranoia would suggest forbidding *any* form of ALTER TABLE when there are pending trigger events, but maybe that's unnecessarily strong. That works for me. Such a combination makes no sense, so banning it is the right thing to do. +1. Doesn't make much sense to me either. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org http://archives.postgresql.org -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add dblink function to check if a named connection exists
Just use plproxy and skip all the hassle of dblink :) On Mon, Jun 2, 2008 at 3:14 AM, Joe Conway [EMAIL PROTECTED] wrote: Tom Lane wrote: Tommy Gildseth [EMAIL PROTECTED] writes: One obvious disadvantage of this approach, is that I need to connect and disconnect in every function. A possible solution to this, would be having a function f.ex dblink_exists('connection_name') that returns true/false depending on whether the connection already exists. Can't you do this already? SELECT 'myconn' = ANY (dblink_get_connections()); A dedicated function might be a tad faster, but it probably isn't going to matter compared to the overhead of sending a remote query. I agree. The above is about as simple as SELECT dblink_exists('dtest1'); and probably not measurably slower. If you still think a dedicated function is needed, please send the output of some performance testing to justify it. If you really want the notational simplicity, you could use an SQL function to wrap it: CREATE OR REPLACE FUNCTION dblink_exists(text) RETURNS bool AS $$ SELECT $1 = ANY (dblink_get_connections()) $$ LANGUAGE sql; contrib_regression=# SELECT dblink_exists('dtest1'); dblink_exists --- f (1 row) I guess it might be worthwhile adding the SQL function definition to dblink.sql.in as an enhancement in 8.4. Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote: I've seen people not doing so more often than you would think. Perhaps because they are DBAs and not sysadmins? I also meant a tool to do things like verify that the changes are valid, as someone else mentioned elsewhere in this thread. pg_ctl -D data check? I would +1 that. I would also really like to see that - though I'd also like to see an SQL interface so we can check a config before saving when editing via pgAdmin or similar. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
On Wed, May 7, 2008 at 7:52 AM, KaiGai Kohei [EMAIL PROTECTED] wrote: Tom, Thanks for your reviewing. The patch hasn't got a mode in which SELinux support is compiled in but not active. This is a good way to ensure that no one will ever ship standard RPMs with the feature compiled in, because they will be entirely nonfunctional for people who aren't interested in setting up SELinux. I think you need an enable_sepostgres GUC, or something like that. (Of course, the overhead of the per-row security column would probably discourage anyone from wanting to use such a configuration anyway, so maybe the point is moot.) We can turn on/off SELinux globally, not bounded to SE-PostgreSQL. The reason why I didn't provide a mode bit like enable_sepostgresql is to keep consistency in system configuration. Hmm, I think ACE should be a CREATE DATABASE parameter. If I were to create a SE-database I would wish that disabling it was more difficult than changing a GUC in database. And being able to set it on per-database basis would help get SE/ACE enabled by packagers. Regards, Dawid -- Solving [site load issues] with [more database replication] is a lot like solving your own personal problems with heroin - at first it sorta works, but after a while things just get out of hand. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Sun, 1 Jun 2008, Peter Eisentraut wrote: Josh Berkus wrote: 1. Most people have no idea how to set these. Could you clarify this? I can't really believe that people are incapable of editing a configuration file. The big problem isn't the editing, it's knowing what to set the configuration values to. This is not to say that editing a configuration file should be considered reasonable. Any GUCS overhaul should include a design goal of being able to completely manage the configuration system using, say, pgadmin (the manage settings via port access part that Josh already mentioned). This is why I was suggesting additions aimed at assimilating all the things that are in the postgresql.conf file. Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. The GUCS data is clearly structured like a database table. Josh's suggested changes are basically adding all the columns needed to it in order to handle everything you'd want to do to the table. If you think of it in those terms and make it possible to manipulate that data using the tools already available for updating tables, you'll open up the potential to add a whole new class of user-friendly applications for making configuration easier to manage. However, I don't fully agree with taking that idea as far as Joshua has suggested (only having the config data in the database), because having everything in a simple text file that can be managed with SCM etc. has significant value. It's nice to allow admins to be able to make simple changes with just a file edit. It's nice that you can look at all the parameters in one place and browse them. However, I do think that the internal database representation must be capable of holding everything in the original postgresql.conf file and producing an updated version of the file, either locally or remotely, as needed. 4. We don't seem to be getting any closer to autotuning. True. But how does your proposal address this? The idea that Josh's suggestions are working toward is simplying the construction of tools that operate on the server configuration file, so that it's easier to write an autotuning tool. Right now, writing such a tool in a generic way gets so bogged down just in parsing/manipulating the postgresql.conf file that it's hard to focus on actually doing the tuning part. If we go back to his original suggestion: http://wiki.postgresql.org/wiki/GUCS_Overhaul Add a script called pg_generate_conf to generate a postgresql.conf based on guc.c and command-line switches (rather than postgresql.conf.sample) It's an easy jump from there to imagine a pg_generate_conf that provide a wizard interface to update a configuration file. I forsee a little GUI or web app that connects to a server on port 5432, finds out some basic information about the server, and gives something like this: Parameter Current Recommended Change? shared_buffers 32MB1024MB [X] effective_cache_size128MB 3000MB [ ] work_mem1MB 16MB[ ] Josh has the actual brains behind such an app all planned out if you look at his presentations, but without the larger overhaul it's just not possible to make the implementation elegant. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [ANNOUNCE] == PostgreSQL Weekly News - June 01 2008 ==
David Fetter writes: Alter the xxx_pattern_ops opclasses to use the regular equality operator of the associated datatype as their equality member. This means that these opclasses can now support plain equality comparisons along with LIKE tests, thus avoiding the need for an extra index in some applications. This is interesting, as it will save me a couple of very big indexes. I didn't see that coming, but thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Sun, 1 Jun 2008, Joshua D. Drake wrote: Well I don't know that a minimum of comments is what I am arguing as much as not too much comments. Josh's proposal included making three levels of documentation-level comments available: terse, normal, and verbose. The verbose comment level probably should include a web link to full documentation. The way the comments litter the existing file, the status quo that's called normal mode in this proposal, is IMHO a complete mess. Most use cases I can think of want either no comments or really verbose ones, the limited commentary in the current sample postgresql.conf seems a middle ground that's not right for anybody. The key thing thing here in my mind is that it should be possible to transform between those three different verbosity levels without losing any settings or user-added comments. They're really just different views on the same data, and which view you're seeing should be easy to change without touching the data. I just extracted the original design proposal and some of the relevent follow-up in this thread, made some additional suggestions, and put the result at http://wiki.postgresql.org/wiki/GUCS_Overhaul I think reading that version makes it a bit clearer what the proposed overhaul is aiming for. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Greg Smith wrote: Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. The GUCS data is clearly structured tables, you'll open up the potential to add a whole new class of user-friendly applications for making configuration easier to manage. However, I don't fully agree with taking that idea as far as Joshua has suggested (only having the config data in the database), because having everything in a simple text file that can be managed with SCM etc. has significant value. It's nice to allow admins to be able to make simple changes with just a file edit. It's nice that you can look at all the parameters in one place and browse them. However, I do think that the internal database representation must be capable of holding everything in the original postgresql.conf file and producing an updated version of the file, either locally or remotely, as needed. Depending on the complexity you want to have inside the generator, one could imagine a middle ground solution like: include database-generated.conf include local-overrides.conf Where the database-generated.conf does not necessarily needs a lot of comments. Josh has the actual brains behind such an app all planned out if you look at his presentations, but without the larger overhaul it's just not possible to make the implementation elegant. IMHO Greg's response is the most comprehensive and well-thought-through contribution in the whole GUC thread. -- Sincerely, Stephen R. van den Berg. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote: What locale is right? If I have a Web app, there could be data in many different languages in a single table/column. I think the above amounts to a need for per-session locale settings or something, no? A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table rewrites vs. pending AFTER triggers
On Sun, Jun 01, 2008 at 11:51:29PM -0700, Robert Hodges wrote: If you are saying that DDL should be auto-commit, yes, this really does limit some use cases. I agree. Transactional DDL is a big feature I'd hate to see go away. Oracle DBAs I know look with envy on this feature of Postgres. with the database that causes a command to fail. SLONY is an example of such a generator where transactional DDL would be helpful though I don¹t know for a fact that SLONY uses it. It sort of does, in that all work under Slony is performed in a transaction. But Slony attempts to isolate the DDL in a sync at just the right point, and not mix schema and data changes in syncs. Still, one of the important best practices in preparing your DDL changing scripts for Slony is to try running that script inside a transaction (and then rolling back) on every node, to ensure that it will in fact work on every node. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline
I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. New variant is much better, but... HeadlineParsedText contains an array of actual words but not information about the norms. We need an indexed position vector for each norm so that we can quickly evaluate a number of possible fragments. Something that tsvector provides. Why do you need to store norms? The single purpose of norms is identifying words from query - but it's already done by hlfinditem. It sets HeadlineWordEntry-item to corresponding QueryOperand in tsquery. Look, headline function is rather expensive and your patch adds a lot of extra work - at least in memory usage. And if user calls with NumFragments=0 the that work is unneeded. This approach does not change any other interface and fits nicely with the overall framework. Yeah, it's a really big step forward. Thank you. You are very close to committing except: Did you find a hlCover() function which produce a cover from original HeadlineParsedText representation? Is any reason to do not use it? The norms are converted into tsvector and a number of covers are generated. The best covers are then chosen to be in the headline. The covers are separated using a hardcoded coversep. Let me know if you want to expose this as an option. Covers that overlap with already chosen covers are excluded. Some options like ShortWord and MinWords are not taken care of right now. MaxWords are used as maxcoversize. Let me know if you would like to see other options for fragment generation as well. ShortWord, MinWords and MaxWords should store their meaning, but for each fragment, not for the whole headline. Let me know any more changes you would like to see. if (num_fragments == 0) /* call the default headline generator */ mark_hl_words(prs, query, highlight, shortword, min_words, max_words); else mark_hl_fragments(prs, query, highlight, num_fragments, max_words); Suppose, num_fragments 2? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Greg Smith [EMAIL PROTECTED] writes: Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. This is, very simply, not going to happen. Shall we go over the reasons why not, one more time? 1. Quite a few of the GUC parameters are needed *before* one can ever read the database; in particular the ones about file locations and shared memory sizing. 2. Suppose you change a parameter in a way that breaks the DB (eg, set shared_buffers to a value larger than your kernel allows) and try to restart. Database doesn't start. If the parameter can only be changed back within an operating database, you're hosed. I have no objection to providing alternative ways to edit the configuration data, but the primary source of the settings is going to continue to be an editable text file. Any proposals for alternatives-to-a-text-editor have to work within that reality. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. This is, very simply, not going to happen. Shall we go over the reasons why not, one more time? 1. Quite a few of the GUC parameters are needed *before* one can ever read the database; in particular the ones about file locations and shared memory sizing. Obviously. 2. Suppose you change a parameter in a way that breaks the DB (eg, set shared_buffers to a value larger than your kernel allows) and try to restart. Database doesn't start. If the parameter can only be changed back within an operating database, you're hosed. That's why those parameters will always have to be read from a textfile. Which doesn't prohibit that textfile to be generated from within the database (once up and running). And yes, if someone changes the parameter through the DB, then makes the DB write out the configfile, then restarts the DB remotely, and the change didn't work out (i.e. DB doesn't start, or crashes/hangs), he is hosed and needs to log in and change the textfile with a texteditor, no way around it. I have no objection to providing alternative ways to edit the configuration data, but the primary source of the settings is going to continue to be an editable text file. Any proposals for alternatives-to-a-text-editor have to work within that reality. I think everyone is aware of that. -- Sincerely, Stephen R. van den Berg. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Jun 2, 2008, at 06:51, Andrew Sullivan wrote: On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote: What locale is right? If I have a Web app, there could be data in many different languages in a single table/column. I think the above amounts to a need for per-session locale settings or something, no? Yes, that's what I was getting at. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] phrase search
I have attached a patch for phrase search with respect to the cvs head. Basically it takes a a phrase (text) and a TSVector. It checks if the relative positions of lexeme in the phrase are same as in their positions in TSVector. Ideally, phrase search should be implemented as new operator in tsquery, say # with optional distance. So, tsquery 'foo #2 bar' means: find all texts where 'bar' is place no far than two word from 'foo'. The complexity is about complex boolean expressions ( 'foo #1 ( bar1 bar2 )' ) and about several languages as norwegian or german. German language has combining words, like a footboolbar - and they have several variants of splitting, so result of to_tsquery('foo # footboolbar') will be a 'foo # ( ( football bar ) | ( foot ball bar ) )' where variants are connected with OR operation. Of course, phrase search should be able to use indexes. If the configuration for text search is simple, then this will produce exact phrase search. Otherwise the stopwords in a phrase will be ignored and the words in a phrase will only be matched with the stemmed lexeme. Your solution can't be used as is, because user should use tsquery too to use an index: column @@ to_tsquery('phrase search') AND is_phrase_present('phrase search', column) First clause will be used for index scan and it will fast search a candidates. For my application I am using this as a separate shared object. I do not know how to expose this function from the core. Can someone explain how to do this? Look at contrib/ directory in pgsql's source code - make a contrib module from your patch. As an example, look at adminpack module - it's rather simple. Comments of your code: 1) +#ifdef PG_MODULE_MAGIC +PG_MODULE_MAGIC; +#endif That isn't needed for compiled-in in core files, it's only needed for modules. 2) use only /**/ comments, do not use a // (C++ style) comments -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: new function array_init
Hello There was more time questions about array's initialisation. I propose function array_init. CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement) RETURNS anyarray; First parameter is array of dimension's sizes. Second argument is value that will be used for initialisation. Because pg array's indexes must not start from 1 we can allow specify it. CREATE OR REPLACE FUNCTION array_init(sizes int[], lowers int[], v anyelement) RETURNS anyarray; select array_init(array[2],0); array --- {0,0} (1 row) select array_init(array[1,2], 0); array --- {{0,0},{0,0}} (1 row) Any comments? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Simon Riggs wrote: Some other problems I see with GUCs * It's not possible to set one parameter depending upon the setting of another. To me this is more critical.. Most people I have seen will increase one or few but not all parameters related to memory which can result in loss of performance and productivity in figuring out. What happened to AvailRAM setting and base all memory gucs on that. Ideally PostgreSQL should only create one big memory pool and allow all other variables to change runtime via dba or some tuner process or customized application as long as total is less than the allocated shared_memory and local_memory settings. (This will also reduce the need of restarting Postgres if a value needs to be changed) -Jignesh * It's always unclear which GUCs can be changed, and when. That is much more infrequently understood than the meaning of them. * We should rename effective_cache_size to something that doesn't sound like it does what shared_buffers does * There is no config verification utility, so if you make a change and then try to restart and it won't, you are in trouble. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Jun 1, 2008, at 21:08, Tom Lane wrote: 1. Does the use of the tolower() C function in the citext data type on pgfoundry basically give me the same results as using lower() in my SQL has for all these years? [ broken record... ] Kinda depends on your locale. However, tolower() is 100% guaranteed not to work for multibyte encodings, so citext is quite useless if you're using UTF8. This is fixable, no doubt, but it's not fixed in the project as it stands. Would the use of str_tolower() in formatting.c fix that? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, 2 Jun 2008, Jignesh K. Shah wrote: Most people I have seen will increase one or few but not all parameters related to memory which can result in loss of performance and productivity in figuring out. If it becomes easier to build a simple tool available to help people tune their configurations, that should help here without having to do anything more complicated than that. What happened to AvailRAM setting and base all memory gucs on that. Like some of the other GUC simplification ideas that show up sometimes (unifying all I/O and limiting background processes based on that total is another), this is hard to do internally. Josh's proposal has a fair amount of work involved, but the code itself doesn't need to be clever or too intrusive. Unifying all the memory settings would require being both clever and intrusive, and I doubt you'll find anybody who could pull it off who isn't already overtasked with more important improvements for the 8.4 timeframe. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
David E. Wheeler [EMAIL PROTECTED] writes: On Jun 1, 2008, at 21:08, Tom Lane wrote: [ broken record... ] Kinda depends on your locale. However, tolower() is 100% guaranteed not to work for multibyte encodings, so citext is quite useless if you're using UTF8. This is fixable, no doubt, but it's not fixed in the project as it stands. Would the use of str_tolower() in formatting.c fix that? Yeah, you need something equivalent to that. I think that whole area is due for refactoring, though --- we've got kind of a weird collection of upper/lower/initcap APIs spread through a couple of different files. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, 2 Jun 2008, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. This is, very simply, not going to happen. Right, there are also technical challenges in the way of that ideal. I was only mentioning the reasons why it might not be the best idea even if it were feasible. However, I do not see why the limitations you bring up must get in the way of thinking about how to interact and manage the configuration data in a database context, even though it ultimately must be imported and exported to a flat file. The concerns you bring up again about leaving the database in an unstartable state are a particularly real danger in the only has access to 5432 hosted provider case that this redesign is trying to satisfy. I added a Gotchas section to the wiki page so that this issue doesn't get forgotten about. The standard way to handle this situation is to have a known good backup configuration floating around. Adding something in that area may end up being a hard requirement before remote editing makes sense. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: new function array_init
Pavel Stehule [EMAIL PROTECTED] writes: There was more time questions about array's initialisation. I propose function array_init. CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement) RETURNS anyarray; I think this is basically a good idea, but maybe the API needs a bit of adjustment --- providing the sizes as an array doesn't seem especially convenient. Since we only allow up to 6 dimensions (IIRC), what about six functions with different numbers of parameters: array_int(int, anyelement) array_int(int, int, anyelement) ... array_int(int, int, int, int, int, int, anyelement) I don't object to having the array-input version too, but seems like in most cases this way would be easier to use. It wouldn't work well for providing lower bounds too, but maybe the array-input case is sufficient for that. Other thoughts: * Should the fill value be the first parameter instead of the last? I'm not sure either way. * I have a mild preference for array_fill instead of array_init. * We can handle a null fill value now, but what about nulls in the dimensions? The alternatives seem to be to return a null array (not an array of nulls) or throw error. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
[EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. If you have that kind of scenario, then you have painted yourself into a corner, and there isn't anything that can be done to extract you from it. Consider: If you have so much update traffic that it is too much to replicate via WAL-copying, why should we expect that other mechanisms *wouldn't* also overflow the connection? If you haven't got enough network bandwidth to use this feature, then nobody is requiring that you use it. It seems like a perfectly reasonable prerequisite to say this requires that you have enough bandwidth. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/ There's nothing worse than having only one drunk head. -- Zaphod Beeblebrox -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Mon, 2008-06-02 at 11:59 -0400, Jignesh K. Shah wrote: Simon Riggs wrote: Some other problems I see with GUCs * It's not possible to set one parameter depending upon the setting of another. To me this is more critical.. Most people I have seen will increase one or few but not all parameters related to memory which can result in loss of performance and productivity in figuring out. What happened to AvailRAM setting and base all memory gucs on that. Ideally PostgreSQL should only create one big memory pool and allow all other variables to change runtime via dba or some tuner process or customized application as long as total is less than the allocated shared_memory and local_memory settings. (This will also reduce the need of restarting Postgres if a value needs to be changed) Agreed. Right now, we can't even do that in code, let alone in config file. If we had a smart_memory_config = on then we'd be able to say in the backend: if (smart_memory_config) { other_thing = 0.1 * Nbuffers; } but the GUCs are evaluated in alphabetical order, without any way of putting dependencies between them. So they are notionally orthogonal. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Jun 2, 2008, at 09:33, Tom Lane wrote: Would the use of str_tolower() in formatting.c fix that? Yeah, you need something equivalent to that. I think that whole area is due for refactoring, though --- we've got kind of a weird collection of upper/lower/initcap APIs spread through a couple of different files. And I just ran into this on 8.3 when trying to install citext: psql:citext.sql:350: ERROR: there is no built-in function named oid_text I'm assuming that this is because a lot of automatic casts were removed in 8.3 or 8.2; There are a bunch of these: CREATE FUNCTION citext(oid) RETURNS citext AS 'oid_text' LANGUAGE 'internal' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION citext(oid) TO PUBLIC; COMMENT ON FUNCTION citext(oid) IS 'convert oid to citext'; CREATE FUNCTION oid(citext) RETURNS oid AS 'text_oid' LANGUAGE 'internal' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION oid(citext) TO PUBLIC; COMMENT ON FUNCTION oid(citext) IS 'convert citext to oid'; CREATE CAST (citext AS oid) WITH FUNCTION oid(citext); CREATE CAST (oid AS citext) WITH FUNCTION citext(oid); CREATE FUNCTION citext(int2) RETURNS citext AS 'int2_text' LANGUAGE 'internal' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION citext(int2) TO PUBLIC; COMMENT ON FUNCTION citext(int2) IS 'convert int2 to citext'; CREATE FUNCTION int2(citext) RETURNS int2 AS 'text_int2' LANGUAGE 'internal' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION int2(citext) TO PUBLIC; COMMENT ON FUNCTION int2(citext) IS 'convert citext to int2'; CREATE CAST (citext AS int2) WITH FUNCTION int2(citext); CREATE CAST (int2 AS citext) WITH FUNCTION citext(int2); And on and on. Clearly this module needs updating for newer PostgreSQLs. I tried removing them all in order to get the data type and tried it out with this script: my $dbh = DBI-connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8 = 1 }); for my $char qw( À Á Â Ã Ä Å Ç Ć Č Ĉ Ċ Ď Đ A B C D ) { print $char: , $dbh-selectrow_array('SELECT LOWER(?::citext)', undef, $char ), $/; } Naturally it didn't work: À: à Á: á Â: â Ã: ã Ä: ä Å: Ã¥ Ç: ç Ć: Ä Č: Ä Ĉ: Ä Ċ: Ä Ď: Ä Đ: Ä A: a B: b C: c D: d BTW, I rebuilt my cluster with --locale en_US.UTF-8 and the script works on a text type, so having a locale is key. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Mon, 2008-06-02 at 09:51 -0400, Andrew Sullivan wrote: On Sun, Jun 01, 2008 at 10:13:07PM -0700, David E. Wheeler wrote: What locale is right? If I have a Web app, there could be data in many different languages in a single table/column. I think the above amounts to a need for per-session locale settings or something, no? What if you had a CHECK constraint that was locale-sensitive? Would the constraint only be non-false (true or null) for records inserted under the same locale? That's not very useful. I think if you want some special treatment of text for some users, it should be explicit. Text in one locale is really a different type from text in another locale, and so changing the locale of some text variable is really a typecast. I don't think GUCs are the correct mechanism for this. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 23:37 +0200, Mathias Brossard wrote: I pointed out that the NTT solution is synchronous because Tom said in the first part of his email that: In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. ... and yet the most appropriate base technology for this is synchronous and maybe I should have also pointed out in my previous mail is that it doesn't support multiple slaves. I don't think that you need too many slaves in sync mode. Probably 1-st slave sync and others async from there on will be good enough. Also, as other have pointed out there are different interpretations of synchronous depending on wether the WAL data has reached the other end of the network connection, a safe disk checkpoint or the slave DB itself. Probably all DRBD-s levels ( A) data sent to network, B) data received, C) data written to disk) should be supported + C1) data replayed in slave DB. C1 meaning that it can be done in parallel with C) Then each DBA can set it up depending on what he trusts - network, slave's power supply or slaves' disk. Also, the case of slave failure should be addressed. I don't think that the best solution is halting all ops on master if slave/network fails. Maybe we should allow also a setup with 2-3 slaves, where operations can continue when at least 1 slave is syncing ? -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: new function array_init
On Mon, Jun 2, 2008 at 9:46 AM, Tom Lane [EMAIL PROTECTED] wrote: Pavel Stehule [EMAIL PROTECTED] writes: There was more time questions about array's initialisation. I propose function array_init. As one of the questioners, I will give some short thoughts below. CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement) RETURNS anyarray; +1. +0 for Pavel's proposed syntax, because it feels better, but also it scales to N dimensions (we should throw an error obviously if the input is too big, but we can probably source that size through an include), I hate functions with more than four arguments, and having six slightly overloaded functions in the system catalogs seems annoying. * We can handle a null fill value now, but what about nulls in the dimensions? The alternatives seem to be to return a null array (not an array of nulls) or throw error. I would throw an error, unless there is something that one can do with a null array (perhaps there is?). We also might want to consider a resize function, and some other utilities as long as we are bothering with this. I am sorry that I can't offer to write these, but I don't have the time to learn the Postgresql infrastructure to do it. Thanks for the attention Pavel! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add dblink function to check if a named connection exists
Joe Conway wrote: If you really want the notational simplicity, you could use an SQL function to wrap it: CREATE OR REPLACE FUNCTION dblink_exists(text) RETURNS bool AS $$ SELECT $1 = ANY (dblink_get_connections()) $$ LANGUAGE sql; Thanks, that seems like a reasonable way to solve this. -- Tommy Gildseth -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Mon, Jun 02, 2008 at 10:29:30AM -0700, Jeff Davis wrote: I think if you want some special treatment of text for some users, it should be explicit. Text in one locale is really a different type from text in another locale, and so changing the locale of some text variable is really a typecast. I don't think GUCs are the correct mechanism for this. The SQL COLLATE syntax handles this just fine. Either the original COLLATE patch or the new one will let people tags strings with any collation they like. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Case-Insensitve Text Comparison
On Mon, 2008-06-02 at 19:55 +0200, Martijn van Oosterhout wrote: The SQL COLLATE syntax handles this just fine. Either the original COLLATE patch or the new one will let people tags strings with any collation they like. http://wiki.postgresql.org/wiki/Todo:Collate The last reference I see on that page is from 2005. Is there any updated information? Are there any major obstacles holding this up aside from the platform issues mentioned on that page? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
Greg, Like some of the other GUC simplification ideas that show up sometimes (unifying all I/O and limiting background processes based on that total is another), this is hard to do internally. Josh's proposal has a fair amount of work involved, but the code itself doesn't need to be clever or too intrusive. Unifying all the memory settings would require being both clever and intrusive, and I doubt you'll find anybody who could pull it off who isn't already overtasked with more important improvements for the 8.4 timeframe. Plus, I'm a big fan of enable an API rather than write a feature. I think that there are people companies out there who can write better autotuning tools than I can, and I'd rather give them a place to plug those tools in than trying to write autotuning into the postmaster. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Sun, 2008-06-01 at 22:13 -0700, David E. Wheeler wrote: What locale is right? If I have a Web app, there could be data in many different languages in a single table/column. I think the values should be explicitly treated differently. It would be nice if you could just typecast, like: lower(somevalue::text(fr_CA)) which would then lowercase according to the fr_CA locale, regardless of the locale of somevalue. Using typmod for localization was brought up here: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00635.php Has it been discussed further? I happen to like the idea of the TEXT type taking a locale as a typmod. No typmod would, of course, fall back to the cluster setting. And it would throw an exception if the encoding couldn't represent that locale. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Mon, Jun 02, 2008 at 10:29:30AM -0700, Jeff Davis wrote: What if you had a CHECK constraint that was locale-sensitive? Would the constraint only be non-false (true or null) for records inserted under the same locale? That's not very useful. It would seem that this is one of the important cases that needs to be worked out. I wasn't suggesting that per-session locale (or whatever we want to call it) is _easy_ or, for that matter, even possible; just that it would solve a large number of the problems that people complain about. In fact, I suspect that what we really need is something a little more like in-database locale or something. I think if you want some special treatment of text for some users, it should be explicit. Yes. Also, not just text. Think of currency, numeric separators, c. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where can I find the doxyfile?
Zdenek Kotala wrote: Xin Wang napsal(a): Hi, I don't know where I can find the doxyfile which generate doxygen.postgresql.org web site. I found that when reading code the doxygen source code is quite helpful. However, I want to generate an off-line copy of doxygen docs myself, but I can't find the doxyfile in the lastest source release. I think it is good idea. Stefan, what's about put it on the wiki? don't think the wiki is a good place (we would have to maintain it in addition to the main file) so I simply added a link on http:/doxygen.postgresql.org that contains the current copy of the configuration file that was used to generate a particular set of docs. However some of the things there are specific to our install so a bit (like some of the paths) of manual change will be required anyway. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
Andrew Sullivan wrote: ... I think if you want some special treatment of text for some users, it should be explicit. Yes. Also, not just text. Think of currency, numeric separators, c. Which imho, should not really be the business of the type interface but instead something to_char() and to_{type} handles. Tino smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Proposal: new function array_init
2008/6/2 Tom Lane [EMAIL PROTECTED]: Pavel Stehule [EMAIL PROTECTED] writes: There was more time questions about array's initialisation. I propose function array_init. CREATE OR REPLACE FUNCTION array_init(sizes int[], v anyelement) RETURNS anyarray; I think this is basically a good idea, but maybe the API needs a bit of adjustment --- providing the sizes as an array doesn't seem especially convenient. Since we only allow up to 6 dimensions (IIRC), what about six functions with different numbers of parameters: array_int(int, anyelement) array_int(int, int, anyelement) ... array_int(int, int, int, int, int, int, anyelement) I don't object to having the array-input version too, but seems like in most cases this way would be easier to use. It wouldn't work well for providing lower bounds too, but maybe the array-input case is sufficient for that. Your proposal is maybe little bit readable with lower bounds, and when initial value is integer. But it's easy do wrap SQL functions . Other thoughts: * Should the fill value be the first parameter instead of the last? I'm not sure either way. I am not sure too. I have not any original - the nearest function is memset? * I have a mild preference for array_fill instead of array_init. maybe, maybe array_set. Any ideas are welcome * We can handle a null fill value now, but what about nulls in the dimensions? The alternatives seem to be to return a null array (not an array of nulls) or throw error. I am afraid so null array can be changed with null value - so I prefer in this case raise exception. Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: new function array_init
On Mon, Jun 02, 2008 at 08:10:19PM +0200, Pavel Stehule wrote: * I have a mild preference for array_fill instead of array_init. maybe, maybe array_set. Any ideas are welcome array_create? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
To diverge a little - Bit of a nood question along these lines - Does LIKE and ILIKE take into consideration the locale allowing insensitive searches in any locale setting? I know that LIKE can use an index if you don't start the match with a wild card. ILIKE doesn't seem to. Is or would it be possible to get ILIKE to use a properly configured index as well? -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote: [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. If you have that kind of scenario, then you have painted yourself into a corner, and there isn't anything that can be done to extract you from it. You are misunderstanding something. It's perfectly possible that you have a low-traffic database with changes every now and then. But you have to copy a full 16 MB logfile every 30 seconds or every minute just to have the slave up-to-date. Consider: If you have so much update traffic that it is too much to replicate via WAL-copying, why should we expect that other mechanisms *wouldn't* also overflow the connection? For some MB real data you copy several GB logfiles per day - that's a lot overhead, isn't it? If you haven't got enough network bandwidth to use this feature, then nobody is requiring that you use it. It seems like a perfectly reasonable prerequisite to say this requires that you have enough bandwidth. If you have a high-traffic database, then of course you need an other connection as if you only have a low-traffic or a mostly read-only database. But that's not the point. Copying an almost unused 16 MB WAL logfile is just overhead - especially because the logfile is not compressable very much because of all the leftovers from earlier use. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] rfc: add pg_dump options to dump output
I would like to add the flags given to pg_dump into the output of the pg_dump file. For text dumps, the output would go on a line below the current header, so output would look like: -- -- PostgreSQL database dump complete -- -- Generated by: pg_dump -s -U rob pagila -- For compressed dumps, the output could go into the headers created with -l, for example: ; Archive created at Mon Jun 2 16:43:19 2008 ; dbname: pgods ; TOC Entries: 3 ; Compression: -1 ; Dump Version: 1.10-0 ; Format: CUSTOM ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 8.3.1 ; Dumped by pg_dump version: 8.3.1 ; Generated by: pg_dump -s -U rob -Fc pagila ; ; ; Selected TOC Entries: Anyone see any issues with this? Should there be other information taken into account? Does this need to be an option itself, or can we just do it in all cases? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rfc: add pg_dump options to dump output
Robert, Anyone see any issues with this? Should there be other information taken into account? Does this need to be an option itself, or can we just do it in all cases? +1 to do it in all cases. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case-Insensitve Text Comparison
On Mon, Jun 02, 2008 at 11:08:55AM -0700, Jeff Davis wrote: http://wiki.postgresql.org/wiki/Todo:Collate The last reference I see on that page is from 2005. Is there any updated information? Are there any major obstacles holding this up aside from the platform issues mentioned on that page? Well, a review of the patch and a bit of work in the optimiser. However, I think the patch will have bitrotted beyond any use by now. It touched many of the areas the operator families stuff touched, for example. I beleive it is being reimplemented as a GSoc project, that's probably a better approach. Should probably just delete the page from the wiki altogether. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] phrase search
On Mon, 2008-06-02 at 19:39 +0400, Teodor Sigaev wrote: I have attached a patch for phrase search with respect to the cvs head. Basically it takes a a phrase (text) and a TSVector. It checks if the relative positions of lexeme in the phrase are same as in their positions in TSVector. Ideally, phrase search should be implemented as new operator in tsquery, say # with optional distance. So, tsquery 'foo #2 bar' means: find all texts where 'bar' is place no far than two word from 'foo'. The complexity is about complex boolean expressions ( 'foo #1 ( bar1 bar2 )' ) and about several languages as norwegian or german. German language has combining words, like a footboolbar - and they have several variants of splitting, so result of to_tsquery('foo # footboolbar') will be a 'foo # ( ( football bar ) | ( foot ball bar ) )' where variants are connected with OR operation. This is far more complicated than I thought. Of course, phrase search should be able to use indexes. I can probably look into how to use index. Any pointers on this? If the configuration for text search is simple, then this will produce exact phrase search. Otherwise the stopwords in a phrase will be ignored and the words in a phrase will only be matched with the stemmed lexeme. Your solution can't be used as is, because user should use tsquery too to use an index: column @@ to_tsquery('phrase search') AND is_phrase_present('phrase search', column) First clause will be used for index scan and it will fast search a candidates. Yes this is exactly how I am using in my application. Do you think this will solve a lot of common case or we should try to get phrase search 1. Use index 2. Support arbitrary distance between lexemes 3. Support complex boolean queries -Sushant. For my application I am using this as a separate shared object. I do not know how to expose this function from the core. Can someone explain how to do this? Look at contrib/ directory in pgsql's source code - make a contrib module from your patch. As an example, look at adminpack module - it's rather simple. Comments of your code: 1) +#ifdef PG_MODULE_MAGIC +PG_MODULE_MAGIC; +#endif That isn't needed for compiled-in in core files, it's only needed for modules. 2) use only /**/ comments, do not use a // (C++ style) comments -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline
Efficiency: I realized that we do not need to store all norms. We need to only store store norms that are in the query. So I moved the addition of norms from addHLParsedLex to hlfinditem. This should add very little memory overhead to existing headline generation. If this is still not acceptable for default headline generation, then I can push it into mark_hl_fragments. But I think any headline marking function will benefit by having the norms corresponding to the query. Why we need norms? hlCover does the exact thing that Cover in tsrank does which is to find the cover that contains the query. However hlcover has to go through words that do not match the query. Cover on the other hand operates on position indexes for just the query words and so it should be faster. The main reason why I would I like it to be fast is that I want to generate all covers for a given query. Then choose covers with smallest length as they will be the one that will best explain relation of a query to a document. Finally stretch those covers to the specified size. In my understanding, the current headline generation tries to find the biggest cover for display in the headline. I personally think that such a cover does not explain the context of a query in a document. We may differ on this and thats why we may need both options. Let me know what you think on this patch and I will update the patch to respect other options like MinWords and ShortWord. NumFragments 2: I wanted people to use the new headline marker if they specify NumFragments = 1. If they do not specify the NumFragments or put it to 0 then the default marker will be used. This becomes a bit of tricky parameter so please put in any idea on how to trigger the new marker. On an another note I found that make_tsvector crashes if it receives a ParsedText with curwords = 0. Specifically uniqueWORD returns curwords as 1 even when it gets 0 words. I am not sure if this is the desired behavior. -Sushant. On Mon, 2008-06-02 at 18:10 +0400, Teodor Sigaev wrote: I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. New variant is much better, but... HeadlineParsedText contains an array of actual words but not information about the norms. We need an indexed position vector for each norm so that we can quickly evaluate a number of possible fragments. Something that tsvector provides. Why do you need to store norms? The single purpose of norms is identifying words from query - but it's already done by hlfinditem. It sets HeadlineWordEntry-item to corresponding QueryOperand in tsquery. Look, headline function is rather expensive and your patch adds a lot of extra work - at least in memory usage. And if user calls with NumFragments=0 the that work is unneeded. This approach does not change any other interface and fits nicely with the overall framework. Yeah, it's a really big step forward. Thank you. You are very close to committing except: Did you find a hlCover() function which produce a cover from original HeadlineParsedText representation? Is any reason to do not use it? The norms are converted into tsvector and a number of covers are generated. The best covers are then chosen to be in the headline. The covers are separated using a hardcoded coversep. Let me know if you want to expose this as an option. Covers that overlap with already chosen covers are excluded. Some options like ShortWord and MinWords are not taken care of right now. MaxWords are used as maxcoversize. Let me know if you would like to see other options for fragment generation as well. ShortWord, MinWords and MaxWords should store their meaning, but for each fragment, not for the whole headline. Let me know any more changes you would like to see. if (num_fragments == 0) /* call the default headline generator */ mark_hl_words(prs, query, highlight, shortword, min_words, max_words); else mark_hl_fragments(prs, query, highlight, num_fragments, max_words); Suppose, num_fragments 2? Index: src/backend/tsearch/dict.c === RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/dict.c,v retrieving revision 1.5 diff -u -r1.5 dict.c --- src/backend/tsearch/dict.c 25 Mar 2008 22:42:43 - 1.5 +++ src/backend/tsearch/dict.c 30 May 2008 23:20:57 - @@ -16,6 +16,7 @@ #include catalog/pg_type.h #include tsearch/ts_cache.h #include tsearch/ts_utils.h +#include tsearch/ts_public.h #include utils/builtins.h Index: src/backend/tsearch/to_tsany.c === RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/to_tsany.c,v retrieving revision 1.12
Re: [HACKERS] rfc: add pg_dump options to dump output
Robert Treat [EMAIL PROTECTED] writes: I would like to add the flags given to pg_dump into the output of the pg_dump file. Why? What is the value of this added complication? Anyone see any issues with this? I'm a bit worried about breaking diff-equality of matching dumps, but mainly I don't see the point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers