Re: [HACKERS] Updated tsearch documentation
On Thu, 26 Jul 2007, Bruce Momjian wrote: Oleg Bartunov wrote: Bruce, I sent you link to my wiki page with summary of changes http://www.sai.msu.su/~megera/wiki/ts_changes Your documentation looks rather old. I have updated it to reflect your changes: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html Bruce, I noticed you miss many changes. For example, options for stemmer has changed (it's documented in my ts_changes), so in http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-CONFIGURATION ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'english-utf8.stop'; should be ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'StopFile=english-utf8.stop, Language=english'; Also, this is wrong DROP TEXT SEARCH CONFIGURATION MAPPING ON pg FOR email, url, sfloat, uri, float; it should be ALTER TEXT SEARCH CONFIGURATION pg DROP MAPPING FOR email, url, sfloat, uri, float; Configuration now doesn't have DEFAULT flag, so \dF should not display 'Y' = \dF pg_catalog | russian | Y public | pg | Y This is what I see now postgres=# \dF public.* List of fulltext configurations Schema | Name | Description +--+- public | pg | --- Oleg On Tue, 24 Jul 2007, Bruce Momjian wrote: I have added more documentation to try to show how full text search is used by user tables. I think this the documentaiton is almost done: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html --- Oleg Bartunov wrote: On Wed, 18 Jul 2007, Bruce Momjian wrote: Oleg, Teodor, I am confused by the following example. How does gin know to create a tsvector, or does it? Does gist know too? No, gist doesn't know. I don't remember why, Teodor ? For GIN see http://archives.postgresql.org/pgsql-hackers/2007-05/msg00625.php for discussion FYI, at some point we need to chat via instant messenger or IRC to discuss the open items. My chat information is here: http://momjian.us/main/contact.html I send you invitation for google talk, I use only chat in gmail. My gmail account is [EMAIL PROTECTED] --- SELECT title FROM pgweb WHERE textcat(title,body) @@ plainto_tsquery('create table') ORDER BY dlm DESC LIMIT 10; CREATE INDEX pgweb_idx ON pgweb USING gin(textcat(title,body)); Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Updated tsearch documentation
On Thu, 26 Jul 2007, Bruce Momjian wrote: Oleg Bartunov wrote: On Wed, 25 Jul 2007, Erikjan wrote: In http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT it says: A document is any text file that can be opened, read, and modified. OOps, in my original documentation it was: Document, in usual meaning, is a text file, that one could open, read and modify. I stress that in database document is something another. http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html I have updated the documentation: http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT Is't worth to reference OpenFTS which used for indexing file system ? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] default_text_search_config and expression indexes
1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. Hello, 2+. Regards Pavel Stehule ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] default_text_search_config and expression indexes
On Fri, 27 Jul 2007, Pavel Stehule wrote: 1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. Hello, 2+. One of the most important purpose of integrating tsearch2 was to facilitate full-text search for people in hosting environment. Usually, they have no superuser rights. I'm asking don't forget about them ! There is no problem with current behaviour once user understand what he do. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] default_text_search_config and expression indexes
On Fri, 27 Jul 2007, Pavel Stehule wrote: 2007/7/27, Oleg Bartunov [EMAIL PROTECTED]: On Fri, 27 Jul 2007, Pavel Stehule wrote: 1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. Hello, 2+. One of the most important purpose of integrating tsearch2 was to facilitate full-text search for people in hosting environment. Usually, they have no superuser rights. I'm asking don't forget about them ! There is no problem with current behaviour once user understand what he do. I am not sure if postgresql is well for multilangual hosting environment. There is problem with locales. Without COLLATE support postgresql can't be used in similar environment. :( configuration has NOTHING with language ! This is a most frequent myth about configuration. It's just the way we chose for default_text_search_config to use language part of locale at initdb time. text search configuration is just a bind between parser to use for breaking document by lexems and mapping between lexeme type and dictionaries. nice a day Pavel Stehule Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] default_text_search_config and expression indexes
2007/7/27, Oleg Bartunov [EMAIL PROTECTED]: On Fri, 27 Jul 2007, Pavel Stehule wrote: 1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. Hello, 2+. One of the most important purpose of integrating tsearch2 was to facilitate full-text search for people in hosting environment. Usually, they have no superuser rights. I'm asking don't forget about them ! There is no problem with current behaviour once user understand what he do. I am not sure if postgresql is well for multilangual hosting environment. There is problem with locales. Without COLLATE support postgresql can't be used in similar environment. :( nice a day Pavel Stehule ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] stats_block_level
Tom Lane wrote: Any reason not to just fold them both into stats_start_collector ? Well, then you couldn't turn collection on and off without restarting the postmaster, which might be a pain. Maybe we don't actually need stats_start_collector, but instead we start it always and just have one knob to turn collection on and off. I'm not sure whether the extra process would bother people if they're not collecting, but we have so many extra processes now, why would anyone care. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] default_text_search_config and expression indexes
configuration has NOTHING with language ! This is a most frequent myth about configuration. It's just the way we chose for default_text_search_config to use language part of locale at initdb time. text search configuration is just a bind between parser to use for breaking document by lexems and mapping between lexeme type and dictionaries. I spoke about impossibility well configuration of postgresql without administrator's rights. For my czech environment is administrator's rights necessary too, because czech dictionary aren't in default installation. Regards Pavel Stehule ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] stats_block_level
Simon Riggs wrote: On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote: Peter Eisentraut wrote: Tom Lane wrote: Any reason not to just fold them both into stats_start_collector ? Well, then you couldn't turn collection on and off without restarting the postmaster, which might be a pain. Maybe we don't actually need stats_start_collector, but instead we start it always and just have one knob to turn collection on and off. I'm not sure whether the extra process would bother people if they're not collecting, but we have so many extra processes now, why would anyone care. I agree. Let's remove stats_start_collector and merge the other two into a single setting. Anything more than that is overkill. Having a single idle process is not a problem to anyone. It just sleeps all the time. We are all used to having six useless getty processes and nobody cares. Yes, thats a great plan. It gets my vote. /D ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] stats_block_level
Peter Eisentraut wrote: Tom Lane wrote: Any reason not to just fold them both into stats_start_collector ? Well, then you couldn't turn collection on and off without restarting the postmaster, which might be a pain. Maybe we don't actually need stats_start_collector, but instead we start it always and just have one knob to turn collection on and off. I'm not sure whether the extra process would bother people if they're not collecting, but we have so many extra processes now, why would anyone care. I agree. Let's remove stats_start_collector and merge the other two into a single setting. Anything more than that is overkill. Having a single idle process is not a problem to anyone. It just sleeps all the time. We are all used to having six useless getty processes and nobody cares. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] stats_block_level
On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote: Peter Eisentraut wrote: Tom Lane wrote: Any reason not to just fold them both into stats_start_collector ? Well, then you couldn't turn collection on and off without restarting the postmaster, which might be a pain. Maybe we don't actually need stats_start_collector, but instead we start it always and just have one knob to turn collection on and off. I'm not sure whether the extra process would bother people if they're not collecting, but we have so many extra processes now, why would anyone care. I agree. Let's remove stats_start_collector and merge the other two into a single setting. Anything more than that is overkill. Having a single idle process is not a problem to anyone. It just sleeps all the time. We are all used to having six useless getty processes and nobody cares. Yes, thats a great plan. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] default_text_search_config and expression indexes
On Thu, Jul 26, 2007 at 06:23:51PM -0400, Bruce Momjian wrote: Oleg Bartunov wrote: Second, I can't figure out how to reference a non-default configuration. See the multi-argument versions of to_tsvector etc. I do see a problem with having to_tsvector(config, text) plus to_tsvector(text) where the latter implicitly references a config selected by a GUC variable: how can you tell whether a query using the latter matches a particular index using the former? There isn't anything in the current planner mechanisms that would make that work. Probably, having default text search configuration is not a good idea and we could just require it as a mandatory parameter, which could eliminate many confusion with selecting text search configuration. We have to decide if we want a GUC default_text_search_config, and if so when can it be changed. Right now there are three ways to create a tsvector (or tsquery) ::tsvector to_tsvector(value) to_tsvector(config, value) (ignoring plainto_tsvector) Only the last one specifies the configuration. The others use the configuration specified by default_text_search_config. (We had an previous discussion on what the default value of default_text_search_config should be, and it was decided it should be set via initdb based on a flag or the locale.) Now, because most people use a single configuration, they can just set default_text_search_config and there is no need to specify the configuration name. However, expression indexes cause a problem here: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX We recommend that users create an expression index on the column they want to do a full text search on, e.g. CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body)); However, the big problem is that the expressions used in expression indexes should not change their output based on the value of a GUC variable (because it would corrupt the index), but in the case above, default_text_search_config controls what configuration is used, and hence the output of to_tsvector is changed if default_text_search_config changes. It wuoldn't actually *corrupt* the index, right? You could end up with wrong results, which might be regarded as corruption in one way, but as long as you change the value back the index still works, no? We have a few possible options: 1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. If we remove default_text_search_config, it would also make ::tsvector casting useless as well. I think 3 is a really bad solution. 2 is a half-bad solution. Do we have a way to say that it can be set at database-level for example, but not at user session? Making it superuser-only to change it but not postgresql.conf-only could accomplish that, along with warnings in the docs for the super user about the effects on current indexes by changing it. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Quick idea for reducing VACUUM contention
ITAGAKI Takahiro wrote: Simon Riggs [EMAIL PROTECTED] wrote: Read the heap blocks in sequence, but make a conditional lock for cleanup on each block. If we don't get it, sleep, then try again when we wake up. If we fail the second time, just skip the block completely. It would be cool if we could do something like sweep a range of pages, initiate IO for those that are not in shared buffers, and while that is running, lock and clean up the ones that are in shared buffers, skipping those that are not lockable right away; when that's done, go back to those buffers that were gotten from I/O and clean those up. And retry the locking for those that couldn't be locked the first time around, also conditionally. And when that's all done, a third pass could get those blocks that weren't cleaned up in none of the previous passes (and this time the lock would not be conditional). Then do a vacuum_delay sleep. When we allow some skips in removing dead tuples, can we guarantee pg_class.relfrozenxid? No we can't. I think we might need additional freezing-xmax operations to avoid XID-wraparound in the first path of vacuum, though it hardly occurs. I'm not sure I follow. Can you elaborate? Do you mean storing a separate relfrozenxmax for each table or something like that? It might be a future topic ... if we are in the direciton of optimistic sweeping, is it possible to remove the second path of vacuum completely? We just add XID of the vacuum to dead tuples we see in the first path. When backends find a dead tuple and see the transaction identified by XID in it has commited, they can freely reuse the area of the dead tuple because we can assume index entries pointing the tuple have been removed by the vacuum. I would be worried about leftover index entries being later used by new tuples in the heap. Then when you visit the index, find that entry, go to the heap and find the new tuple and return it, which could be bogus. (Unless, I think, you check in the index when you are going to insert the new index tuple -- if the CTID is already used, reuse that entry or remove it before insertion). I don't know. Maybe it's OK but it seems messy even if it is. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] allow CSV quote in NULL
[redirecting to -hackers] Stephen Frost wrote: * Gregory Stark ([EMAIL PROTECTED]) wrote: Tom Lane [EMAIL PROTECTED] writes: Stephen Frost [EMAIL PROTECTED] writes: Please find attached a minor patch to remove the constraints that a user can't include the delimiter or quote characters in a 'NULL AS' string when importing CSV files. This can't really be sane can it? Not very, no :-) The alternative would be interpreting NULL strings after dequoting but that would leave no way to include the NULL string literally. This solution means there's no way to include it (if it needs quoting) but only when you specify it this way. Yeah, interpreting NULLs after dequoting means you've lost the information about if it's quoted or not, or you have to add some funky syntax to say if it's quoted, do it differently..., which is no good, imv. What the patch does basically is say give us the exact string that shows up between the unquoted delimiters that you want to be treated as a NULL. This removes the complexity of the question about quoting, unquoting, whatever, and makes it a very clear-cut, straight-forward solution with no impact on existing users, imv. This looks too clever by half, to me. Someone facing the problem you are facing would have to dig quite deep to find the solution you're promoting. A much better way IMNSHO would be to add an extra FORCE switch. On input, FORCE NOT NULL says to treat an unquoted null as the literal value rather than as a null field for the columns named. The reverse would be to tell it to treat a quoted null as null rather than as the literal value, for the named columns. Perhaps that should just be FORCE NULL columnlist. It would be more explicit and at the same time would only apply to the named columns, rather than discarding totally the ability to distinguish between null and not null values. This should probably be discussed on -hackers, anyway. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] LSN grouping within clog pages
On Thu, 2007-07-26 at 22:37 -0400, Tom Lane wrote: I've been looking at the way that the async-commit patch conserves shared memory space by remembering async commit LSNs for groups of transactions on a clog page, rather than having an LSN for each individual transaction slot. This seems like a good plan to me, but I'm confused about one point. The README text claims that each LSN represents a contiguous group of transactions, that is, with the proposed parameters each LSN would represent 256 sequential transactions. However, it looks to me that what the code is actually doing: #define GetLSNIndex(slotno, xid)((slotno) * CLOG_LSNS_PER_PAGE + \ (xid) % (TransactionId) CLOG_XACTS_PER_LSN) results in transactions that are spaced 256 XIDs apart sharing the same LSN slot. I'm not sure whether the code is good and the README is bogus, or vice versa. Well, in this case the README correctly expresses my intention and the code does not. Sharing LSNs among contiguous groups of XIDs seems appealing because you'd expect that such a group would have relatively close LSNs, and so not much information is lost. OTOH, the modulo idea is interesting too, because if the transaction rate is less than 256 commits per walwriter cycle, you'd effectively have exact information for all the currently unflushed transactions. But the downside is that transactions that are really quite old might transiently appear un-hintable because some later transaction that happens to share that LSN slot isn't flushed yet. The modulo idea was my original intention and I attempted to change this in v23. The above paragraph is pretty much how my thoughts evolved and now I'm fully on the side of contiguous rather than striped, even though the code says differently. BTW, I don't think I believe at all the arguments given in the README about what CLOG_LSNS_PER_PAGE should be, particularly since possible changes in BLCKSZ weren't factored in. I'm inclined to set it so that the LSNs take up the same amount of space as the clog buffers themselves, ie, BLCKSZ/8 LSNs per page. Sure, BLCKSZ should be factored in. Do we really need so many LSNs though? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch
Testers here were having a hard time constructing test cases to reach some lines touched by the varvarlena patch. Upon further investigation I'm convinced they're unreachable. Some were added when I did packed varlena -- I've removed those. These lines were actually necessary earlier but when we made attstorage='p' exempt columns from SHORT treatment that made these lines unreachable. I think they impede the readability so it's best to just remove them. The other lines I think are useful if only to make the code self-documenting. But they're unreachable due to the way the loop works and the way it tracks toast_action so I documented that fact for the next person using gcov. I also couldn't find a way to trigger the pfree() lines but I'm still looking at that. Even when we're retoasting previously detoasted datums from an updated row we still don't call pfree so something's a bit weird here. Lastly, we currently never compress anything below 256 bytes so we never compress SHORT varlenas. But trying to compress a datum only to find it's uncompressible is a fairly expensive operation. Aside from doing a palloc we also end up having to do a whole iteration of this loop including recalculating the size of the tuple and looking for the next largest datum. I would suggest we should decrease the minimum size to 32 instead of the current 256 which will mean we could compress SHORT data. But as long as we don't do that we should have a check like below which will avoid trying to. (We might still have a check against VARSIZE in toast_compress_datum to avoid the palloc.) Index: tuptoaster.c === RCS file: /home/stark/src/REPOSITORY/pgsql/src/backend/access/heap/tuptoaster.c,v retrieving revision 1.74 diff -c -r1.74 tuptoaster.c *** tuptoaster.c6 Apr 2007 04:21:41 - 1.74 --- tuptoaster.c27 Jul 2007 14:38:14 - *** *** 535,541 need_change = true; need_free = true; } ! /* * Remember the size of this attribute */ --- 535,548 need_change = true; need_free = true; } ! else if (VARATT_IS_SHORT(new_value) || VARSIZE(new_value) 256) ! { ! /* The default pg_lz_compress strategy doesn't compress things !* under 256 bytes so skip iterations through the loop trying !* to compress them */ ! toast_action[i] = 'x'; ! } ! /* * Remember the size of this attribute */ *** *** 590,596 if (toast_action[i] != ' ') continue; if (VARATT_IS_EXTERNAL(toast_values[i])) ! continue; if (VARATT_IS_COMPRESSED(toast_values[i])) continue; if (att[i]-attstorage != 'x') --- 597,604 if (toast_action[i] != ' ') continue; if (VARATT_IS_EXTERNAL(toast_values[i])) ! /* Dead code due to toast_action */ ! continue; if (VARATT_IS_COMPRESSED(toast_values[i])) continue; if (att[i]-attstorage != 'x') *** *** 654,659 --- 662,668 if (toast_action[i] == 'p') continue; if (VARATT_IS_EXTERNAL(toast_values[i])) + /* Dead code due to toast_action */ continue; if (att[i]-attstorage != 'x' att[i]-attstorage != 'e') continue; *** *** 703,712 --- 712,723 if (toast_action[i] != ' ') continue; if (VARATT_IS_EXTERNAL(toast_values[i])) + /* Dead code due to toast_action */ continue; if (VARATT_IS_COMPRESSED(toast_values[i])) continue; if (att[i]-attstorage != 'm') + /* Dead code (what else could attstorage be at this point?) */ continue; if (toast_sizes[i] biggest_size) { *** *** 766,771 --- 777,783
Re: [HACKERS] stats_block_level
On Fri, 2007-07-27 at 10:15 +0100, Dave Page wrote: Simon Riggs wrote: On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote: Peter Eisentraut wrote: Tom Lane wrote: Any reason not to just fold them both into stats_start_collector ? Well, then you couldn't turn collection on and off without restarting the postmaster, which might be a pain. Maybe we don't actually need stats_start_collector, but instead we start it always and just have one knob to turn collection on and off. I'm not sure whether the extra process would bother people if they're not collecting, but we have so many extra processes now, why would anyone care. I agree. Let's remove stats_start_collector and merge the other two into a single setting. Anything more than that is overkill. Having a single idle process is not a problem to anyone. It just sleeps all the time. We are all used to having six useless getty processes and nobody cares. Yes, thats a great plan. It gets my vote. Look to -patches for an implementation of the above. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch
Sorry, meant to send the previous message to pgsql-patches. Here's a version cut using cvs diff so it's usable with -p0 I added one more fixup. There was a silly test in toast_fetch_datum_slice() which handled compressed datums. Returning a slice of a compressed datum is nonsensical with toast since the resulting datum would be useless. I also added an assertion in this function that the datum is external before we treat it as a toast_pointer. (Incidentally, I did eventually manage to construct a case to reach all the pfrees.) tuptoaster-fixup.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] allow CSV quote in NULL
Stephen Frost wrote: I'm honestly not a big fan of the columnlist approach that's been taken with the options. While I understand the desire to seperate the parsing from the typing, making the users essentially do that association for us by way of making them specify how to handle each column explicitly is worse than just accepting that different types may need to be handled in different ways. Whether or not you like it, the fact is it's there. I think any solution should be consistent with what is done. When CSV was first discussed we looked at doing type-specific behaviour. The end of the long debate was that we simply couldn't do that safely, and the only recourse was to require the user to specify the behaviour required if it differed from the default. You might be inclined to want to revisit that, but I am not. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] allow CSV quote in NULL
* Andrew Dunstan ([EMAIL PROTECTED]) wrote: This looks too clever by half, to me. Someone facing the problem you are facing would have to dig quite deep to find the solution you're promoting. Oddly enough, it was one of the first things I tried when I discovered it wasn't just realizing that ,, for an integer column meant NULL (and instead was complaining loudly that you can't convert an empty string into an integer). It's also pretty clear, to me at least, to say put the exact string that shows up between the delimiters here that you want treated as a NULL rather than well, if it's a column which is quoted then you have to jump through these hoops and tell PG about each one, but if it's not quoted you have to do this, etc, etc. A much better way IMNSHO would be to add an extra FORCE switch. On input, FORCE NOT NULL says to treat an unquoted null as the literal value rather than as a null field for the columns named. The reverse would be to tell it to treat a quoted null as null rather than as the literal value, for the named columns. Perhaps that should just be FORCE NULL columnlist. It would be more explicit and at the same time would only apply to the named columns, rather than discarding totally the ability to distinguish between null and not null values. I don't see that it needs to be 'more explicit', that's just silly. Either the user indicated they want it, or they didn't. What you're suggesting adds in a bunch of, imv, unnecessary complication and ends up making the resulting code that much bigger and uglier for not much gain. I'm honestly not a big fan of the columnlist approach that's been taken with the options. While I understand the desire to seperate the parsing from the typing, making the users essentially do that association for us by way of making them specify how to handle each column explicitly is worse than just accepting that different types may need to be handled in different ways. We could instead flip it around and force the users to specify, for each column, what, exactly, should be done for that column by having them specify a regexp for that column. The regexp would implicitly have the delimiter on each side of it and we'd just step through the string matching as far as we can for each column. Then it's nice and explicit for everyone but probably not much fun to use. This should probably be discussed on -hackers, anyway. As a small, unobtrusive patch, I felt it didn't need a long discussion about what everyone's CSV files look like and how that just shouldn't be done or that's just not sane. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCHES] allow CSV quote in NULL
* Andrew Dunstan ([EMAIL PROTECTED]) wrote: Stephen Frost wrote: I'm honestly not a big fan of the columnlist approach that's been taken with the options. While I understand the desire to seperate the parsing from the typing, making the users essentially do that association for us by way of making them specify how to handle each column explicitly is worse than just accepting that different types may need to be handled in different ways. Whether or not you like it, the fact is it's there. I think any solution should be consistent with what is done. Other, unrelated, options being or not being there doesn't really have any bearing on this though. I'm not inventing new syntax here. I'm just removing a restriction on what the user can do that doesn't need to exist. Indeed, other more convoluted and complex things could still be added, if someone wants them, this doesn't prevent that. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Updated tsearch documentation
Oleg Bartunov wrote: On Thu, 26 Jul 2007, Bruce Momjian wrote: Oleg Bartunov wrote: On Wed, 25 Jul 2007, Erikjan wrote: In http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT it says: A document is any text file that can be opened, read, and modified. OOps, in my original documentation it was: Document, in usual meaning, is a text file, that one could open, read and modify. I stress that in database document is something another. http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html I have updated the documentation: http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT Is't worth to reference OpenFTS which used for indexing file system ? Uh, not sure. I don't think so but we can add a URL to it if you can find the right place. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Updated tsearch documentation
Thanks, I found a few more places that needed updating. It should be accurate now. Thanks for the report. --- Oleg Bartunov wrote: On Thu, 26 Jul 2007, Bruce Momjian wrote: Oleg Bartunov wrote: Bruce, I sent you link to my wiki page with summary of changes http://www.sai.msu.su/~megera/wiki/ts_changes Your documentation looks rather old. I have updated it to reflect your changes: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html Bruce, I noticed you miss many changes. For example, options for stemmer has changed (it's documented in my ts_changes), so in http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-CONFIGURATION ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'english-utf8.stop'; should be ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'StopFile=english-utf8.stop, Language=english'; Also, this is wrong DROP TEXT SEARCH CONFIGURATION MAPPING ON pg FOR email, url, sfloat, uri, float; it should be ALTER TEXT SEARCH CONFIGURATION pg DROP MAPPING FOR email, url, sfloat, uri, float; Configuration now doesn't have DEFAULT flag, so \dF should not display 'Y' = \dF pg_catalog | russian | Y public | pg | Y This is what I see now postgres=# \dF public.* List of fulltext configurations Schema | Name | Description +--+- public | pg | --- Oleg On Tue, 24 Jul 2007, Bruce Momjian wrote: I have added more documentation to try to show how full text search is used by user tables. I think this the documentaiton is almost done: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html --- Oleg Bartunov wrote: On Wed, 18 Jul 2007, Bruce Momjian wrote: Oleg, Teodor, I am confused by the following example. How does gin know to create a tsvector, or does it? Does gist know too? No, gist doesn't know. I don't remember why, Teodor ? For GIN see http://archives.postgresql.org/pgsql-hackers/2007-05/msg00625.php for discussion FYI, at some point we need to chat via instant messenger or IRC to discuss the open items. My chat information is here: http://momjian.us/main/contact.html I send you invitation for google talk, I use only chat in gmail. My gmail account is [EMAIL PROTECTED] --- SELECT title FROM pgweb WHERE textcat(title,body) @@ plainto_tsquery('create table') ORDER BY dlm DESC LIMIT 10; CREATE INDEX pgweb_idx ON pgweb USING gin(textcat(title,body)); Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Updated tsearch documentation
Dimitri Fontaine wrote: -- Start of PGP signed section. Hi, Le mercredi 25 juillet 2007, Bruce Momjian a ?crit?: I have added more documentation to try to show how full text search is used by user tables. I think this the documentaiton is almost done: http://momjian.us/expire/fulltext/HTML/textsearch-tables.html I've come to understand that GIN indexes are far more costly to update than GiST one, and Oleg's wiki advice users to partition data and use GiST index for live part and GIN index for archive part only. Is it worth mentioning this into this part of the documentation? And if mentioned here, partitioning step could certainly be part of the example... or let it as a user exercise, but then explaining why GIN is a good choice in the provided example. Partitioning is already in the documentation: Partitioning of big collections and the proper use of GiST and GIN indexes allows the implementation of very fast searches with online update. Partitioning can be done at the database level using table inheritance and varnameconstraint_exclusion/, or distributing documents over servers and collecting search results using the filenamecontrib/dblink/ extension module. The latter is possible because ranking functions use only local information. I don't see a reason to provide an example beyond the existing examples of how to do partitioning. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] default_text_search_config and expression indexes
Magnus Hagander wrote: However, the big problem is that the expressions used in expression indexes should not change their output based on the value of a GUC variable (because it would corrupt the index), but in the case above, default_text_search_config controls what configuration is used, and hence the output of to_tsvector is changed if default_text_search_config changes. It wuoldn't actually *corrupt* the index, right? You could end up with wrong results, which might be regarded as corruption in one way, but as long as you change the value back the index still works, no? Right, it would _temporarily_ corrupt it. ;-) We have a few possible options: 1) Document the problem and do nothing else. 2) Make default_text_search_config a postgresql.conf-only setting, thereby making it impossible to change by non-super users, or make it a super-user-only setting. 3) Remove default_text_search_config and require the configuration to be specified in each function call. If we remove default_text_search_config, it would also make ::tsvector casting useless as well. I think 3 is a really bad solution. 2 is a half-bad solution. Do we have a way to say that it can be set at database-level for example, but not at user session? Making it superuser-only to change it but not postgresql.conf-only could accomplish that, along with warnings in the docs for the super user about the effects on current indexes by changing it. OK, here is what I am thinking. If we make default_text_search_config super-user-only, then the user can't do SET (using zero_damaged_pages as a superuser-only example): test= set zero_damaged_pages = on; ERROR: permission denied to set parameter zero_damaged_pages test= alter user guest set zero_damaged_pages = on; ERROR: permission denied to set parameter zero_damaged_pages but the super-user can set it in postgresql.conf, or: test=# alter user guest set zero_damaged_pages = on; ALTER ROLE or test=# alter database vendor3 set zero_damaged_pages = on; ALTER ROLE meaning while it will be super-user-only, the administrator can set the default for specific databases and users. Is that the best approach? A user can still over-ride the default by specifying the configuration in the function call. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq