Re: [GENERAL] Inherited FK Indexing
Hi, Ketema Harris wrote: as expected I can do select * from states and get everything out of the child table as well. What I can't do is create a FK to the states table and have it look in the child table as well. Is this on purpose? Is it possible to have FK that spans into child tables? This is a well known (and documented, see [1]) deficiency. It's due to the current implementation of indices, which are bound to exactly one table, meaning they do return a position within the table, but cannot point to different tables. Regards Markus [1]: Postgres Documentation, Chapter 5.8.1 Caveats (of Inheritance): http://www.postgresql.org/docs/8.2/static/ddl-inherit.html ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] oracle rank() over partition by queries
Hello Sharmi Joe, sharmi Joe wrote: Is there a way to get the oracle's rank() over partition by queries in postgresql? These are known as window functions. AFAIK Gavin Sherry is working on an implementation for Postgres. Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Scalability Design Questions
Hi, novnov wrote: OK, this has been very informative and I'd like to thank the three of you. Asynchronous replication to readonly slaves is something I will look into. I've never touched posgtres replication; and Scott mentioned that he was not familiar with PGCluster, so there must be some other replication system he's referencing, maybe Slony-I? Not sure if you've found those, but just to make sure: there's the a nice chapter in the official Postgres Documentation about High Availability and Load Balancing [1]. Another starting point might be the advocacy wiki at [2]. Regards Markus [1]: Postgres Documentation, Chapter 24. High Availability and Load Balancing: http://www.postgresql.org/docs/8.2/static/high-availability.html [2]: Postgres Advocacy Wiki, Replication: http://developer.postgresql.org/index.php/Replication%2C_Clustering%2C_and_Connection_Pooling ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Connection Pooling directly on Postgres Server
Hi, Denis Gasparin wrote: Why not to implement a connection pooling server side as apache for example does? This has certainly been discussed before. IIRC the real argument against that was, that fork() isn't the most expensive thing to do anymore. And Postgres does lots of other stuff after accept(), namely connecting to a certain database, authenticating the user, etc.. If you still want to optimize that, you'd end up having n spare backends *per database*. I do that in Postgres-R - not for connection pooling, but for application of remote transactions. Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Decibel! wrote: But is the complete transaction information safely stored on all nodes before a commit returns? Good question. It depends very much on the group communication system and the guarantees it provides for message delivery. For certain, the information isn't safely stored on every node before commit confirmation. Let me quickly explain those two points. Lately, I've read a lot about different Group Communication Systems and how they handle delivery guarantees. Spread offers an 'agreed' and a 'safe' mode, only the later guarantees that all nodes have received the data. It's a rather expensive mode in terms of latency. In our case, it would be sufficient if at least n nodes would confirm having correctly received the data. That would allow for (n - 1) simultaneously failing nodes, so that there's always at least one correct node which has received the data, even if the sender just failed after sending. This one node can redistribute the data to others which didn't receive the message until all nodes have received it. No group communication system I know of offers such fine grained levels of delivery guarantees. Additionally, I've figured that it would be nice to have subgroups and multiple orderings within a group. Thus - opposed to my initial intention - I've finally started to write yet another group communication system, providing all of these nice features. Anyway, that's another story. Regarding durability: given the above assumption, that at most (n - 1) nodes fail, you don't have to care much about recovery, because there's always at least one running node which has all the data. As we know, reality doesn't always care about our assumptions. So, if you want to prevent data loss due to failures of more than (n - 1) nodes, possibly even all nodes, you'd have to do transaction logging, much like WAL, but a cluster-wide one. Having every single node write a transaction log, like WAL, would be rather expensive and complex during recovery, as you'd have to mix and match all node's WALs. Instead, I think it's better to decouple transaction logging (backup) from ordinary operation. That gives you much more freedom. For example, you could have nodes dedicated to and optimized for logging. But most importantly, you have separated the problem: as long as your permanent storage for transaction logging is living, you can recover your data. No matter what's happening with the rest of the cluster. And the other way around: as long as your cluster is living (i.e. no more than (n - 1) simultaneous failures), you don't really need the transaction log. So, before committing a transaction, a node has to wait for the delivery of the data through the GCS *and* for the transaction logger(s) to have written the data to permanent storage. Please note, that those two operations can be done simultaneously, i.e. the latency does not summarize, it's rather just the maximum of the two. Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Hi, Bill Moran wrote: While true, I feel those applications are the exception, not the rule. Most DBs these days are the blogs and the image galleries, etc. And those don't need or want the overhead associated with synchronous replication. Uhm.. do blogs and image galleries need replication at all? I'm thinking more of the business critical applications, where high availability is a real demand - and where your data *should* better be distributed among multiple data centers just to avoid a single point of failure. for most other stuff MySQL is good enough I find that line fuzzy. Yeah, it is. It's synchronous for the reason you describe, but it's asynchronous because a query that has returned successfully is not _guaranteed_ to be committed everywhere yet. Seems like we're dealing with a limitation in the terminology :) Certainly! But sync and async replication are so well known and used terms... on the other hand, I certainly agree that in Postgres-R, the nodes do not process transactions synchronously, but asynchronous. Maybe it's really better to speak of eager and lazy replication, as in some literature (namely the initial Postgres-R paper of Bettina Kemme). This could potentially be a problem on (for example) a web application, where a particular user's experience may be load-balanced to another node at any time. Of course, you just have to write the application with that knowledge. IMO, such heavily dynamic load-balancing is rarely useful. With application support, it's easily doable: let the first transaction on node A query the (global) transaction identifier and after connecting to the next node B, ask that to wait until that transaction has committed. It gets a little harder without application support: the load balancer would have to keep track of sessions and their last (writing) transaction. Again, thank you for pointing this out. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Hi, Bill Moran wrote: First off, "clustering" is a word that is too vague to be useful, so I'll stop using it. There's multi-master replication, where every database is read-write, then there's master-slave replication, where only one server is read-write and the rest are read-only. You can add failover capabilities to master-slave replication. Then there's synchronous replication, where all servers are guaranteed to get updates at the same time. And asynchronous replication, where other servers may take a while to get updates. These descriptions aren't really specific to PostgreSQL -- every database replication system has to make design decisions about which approaches to support. Good explanation! Synchronous replication is only really used when two servers are right next to each other with a high-speed link (probably gigabit) between them. Why is that so? There's certainly very valuable data which would gain from an inter-continental database system. For money transfers, for example, I'd rather wait half a second for a round trip around the world, to make sure the RDBS does not 'loose' my money. PostgreSQL-R is in development, and targeted to allow multi-master, asynchronous replication without rewriting your application. As far as I know, it works, but it's still beta. Sorry, this is nitpicking, but for some reason (see current naming discussion on -advocacy :-) ), it's "Postgres-R". Additionally, Postgres-R is considered to be a *synchronous* replication system, because once you get your commit confirmation, your transaction is guaranteed to be deliverable and *committable* on all running nodes (i.e. it's durable and consistent). Or put it another way: asynchronous systems have to deal with conflicting, but already committed transactions - Postgres-R does not. Certainly, this is slightly less restrictive than saying that a transaction needs to be *committed* on all nodes, before confirming the commit to the client. But as long as a database session is tied to a node, this optimization does not alter any transactional semantics. And despite that limitation, which is mostly the case in reality anyway, I still consider this to be synchronous replication. [ To get a strictly synchronous system with Postgres-R, you'd have to delay read only transactions on a node which hasn't applied all remote transactions, yet. In most cases, that's unwanted. Instead, a consistent snapshot is enough, just as if the transaction started *before* the remote ones which still need to be applied. ] BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. Hm.. somewhen before 8.3 was released, we had lots of discussions on -docs about the "high availability and replication" section of the PostgreSQL documentation. I'd have liked to add these fundamental concepts, but Bruce - rightly - wanted to keep focused on existing solutions. And unfortunately, most existing solutions are async, single-master. So explaining all these wonderful theoretic concepts only to state that there are no real solutions would have been silly. Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Geographic High-Availability/Replication
Hello Bill, Bill Moran wrote: It appears as if I miscommunicated my point. I'm not expecting PostgreSQL-R to break the laws of physics or anything, I'm just curious how it reacts. This is the difference between software that will be really great one day, and software that is great now. Agreed. As Postgres-R is still a prototype, it does *currently* not handle the situation at all. But I'm thankful for this discussion, as it it helps me figuring out how Postgres-R *should* react. So, thank you for pointing this out. Great now would mean the system would notice that it's too far behind and Do The Right Thing automatically. I'm not exactly sure what The Right Thing is, but my first guess would be force the hopelessly slow node out of the cluster. I expect this would be non-trivial, as you've have to have a way to ensure it was a problem isolated to a single (or few) nodes, and not just the whole cluster getting hit with unexpected traffic. Hm.. yeah, that's a tricky decision to make. For a start, I'd be in favor of just informing the administrator about the delay and let him take care of the problem (as currently done with 'disk full' conditions). Instead of trying to do something clever automatically. (This seems to be much more PostgreSQL-like, too). Of course not, that's why the behaviour when that non-ideal situation occurs is so interesting. How does PostgreSQL-R fail? PostgreSQL fails wonderfully: A hardware crash will usually result in a system that can recover without operator intervention. In a system like PostgreSQL-R, the failure scenarios are more numerous, and probably more complicated. I agree that there are more failure scenarios. Although fewer are critical to the complete system. IMO, a node which is too slow should not be considered a failure, but rather a system limitation (possibly due to unfortunate configuration), much like out of memory or disk space conditions. Forcing such a node to go down could have unwanted side effects on the other nodes (i.e. increased read-only traffic) *and* does not solve the real problem. Again, thanks for pointing this out. I'll think more about some issues, especially similar corner cases like this one. Single-node disk full would be another example. Possibly also out of memory conditions? Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Marko Kreen wrote: Such situation is not a specific problem to Postgres-R or to synchronous replication in general. Asyncronous replication will break down too. Agreed, except that I don't consider slowness as 'breaking down'. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Bill Moran wrote: I'm curious as to how Postgres-R would handle a situation where the constant throughput exceeded the processing speed of one of the nodes. Well, what do you expect to happen? This case is easily detectable, but I can only see two possible solutions: either stop the node which is to slow or stop accepting new transactions for a while. This technique is not meant to allow nodes to lag behind several thousands of transactions - that should better be avoided. Rather it's meant to decrease the commit delay necessary for synchronous replication. I can see your system working if it's just spike loads and the slow nodes can catch up during slow periods, but I'm wondering about the scenarios where an admin has underestimated the hardware requirements and one or more nodes is unable to keep up. Please keep in mind, that replication per se does not speed your database up, it rather adds a layer of reliability, which *costs* some performance. To increase the transactional throughput you would need to add partitioning to the mix. Or you could try to make use of the gained reliability and abandon WAL - you won't need that as long as at least one replica is running - that should increase the single node's throughput and therefore the cluster's throughput, too. When replication meets partitioning and load balancing, you'll get into a whole new world, where new trade-offs need to be considered. Some look similar to those with RAID storage - probably Sequoia's term RAIDb isn't bad at all. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Gregory Stark wrote: Only if your application is single-threaded. By single-threaded I don't refer to operating system threads but to the architecture. If you're processing a large batch file handling records one by one and waiting for each commit before proceeding then it's single threaded. If you have a hundred independent clients on separate connections doing separate things then each one of them could get 6tps. Which you have will depend on your application and your needs, it may not be something you can change. Correct. Plus, as in the implementation of Postgres-R, performance is *not* bound to the slowest node. Instead, every node can process transactions at it's own speed. Slower nodes might then have to queue transactions from those until they catch up again. Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Geographic High-Availability/Replication
Hi, Matthew wrote: Hey all, new postgres user here. We are trying to setup/research an HA/Replicated solution with Postrgresql between a datacenter in LA and a d.c. in NY. We have a private LAN link between the two D.C.'s with a max round-trip of 150ms. We will have a web server at each d.c. (among other servers) that will write/read to/from the local LAN DB. On writes, that data should be xmited to the other data center so that if, for whatever reason, my website request was sent to LA instead of NY, all my session information etc will still exist. This is commonly known as synchronous replication. As that involves *at-least* one round-trip *before* committing, it's quite expensive. Can you live with a delay of ~150ms before COMMIT confirmation? Another issue is the reliability of your failure detectors. How does server B know that server A is really down (and not only the link?). Normally, that's solved with a quorum device. So that you have to have at least three servers - preferably in different locations. Regards Markus Disclaimer: I'm the developer behind Postgres-R (www.postgres-r.org), have a look at it, it's designed to do what you are looking for. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid page header
Hi, Markus Schiltknecht wrote: I've done that (zeroing out the pg_toast table page) and hope > the running pg_dump goes through fine. Unfortunately, pg_dump didn't go through. I already did some REINDEXing and VACUUMing. Vacuum fixed something (sorry, don't I recall the message), but SELECTing from the table still fails, as some fields of course reference now deleted toast data: ERROR: missing chunk number 0 for toast value 479667 Is there any way to find this toast value 479667? Something like the following query: SELECT id FROM my_corrupt_table WHERE pg_is_toasted('column') AND pg_toasted_value('column') == 479667; Again, I'm fine loosing the data in the toasted fields in question. But I'd rather like to save the remaining data. I've already tried UPDATing a field I've found manually (by trial and error). That helped and I can now select that row again (having lost only the field's content). Is there some sort of error recovery tool? Or should VACUUM fix such things on special request? Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] pgpool2 vs sequoia
Hi, David Fetter wrote: Very few people actually need synchronous replication, and those who do buy Oracle's RAC (and curse it) or use DB2's offering (and also curse it ;). For most purposes, fast asynchronous replication is good enough. While this is certainly true, please keep in mind that async replication always brings up the potential of conflicts, per definition - no matter how fast it is. IMO, it would often be a lot simpler and less expensive to use sync replication and bite the bullet of a small commit delay (depending on the interconnect) - but not having to deal with conflicts. OTOH, of course there's no real (at least no OSS) solution to sync replication, so this is just theory. I'm trying to change that with Postgres-R [1]. As a second note, I might add that all of this really only applies to writing transactions. Read-only transactions are, of course, not affected by replication and can be balanced across multiple servers with both types of replication. Only sync replication guarantees consistent snapshots, though. Which is the reason for conflicts... But again, this is just gray theory. And practically speaking, I'm giving you the same general advice: prefer async replication, because there are solutions, which are mature and used in production. Regards Markus [1]: For more information, see: www.postgres-r.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] invalid page header
Hi, Tom Lane wrote: Hm, looks suspiciously ASCII-like. If you examine the page as text, is it recognizable? Doh! Yup, is recognizable. It looks like some PHP serialized output: png%";i:84;s:24:"%InfoToolIconActive.png%";i:85;s:29:"%InfoToolIconHighlighted.png%";i:86;s:26:"%InfoToolIconInactive.png%";i:87; We do store serialized PHP objects like the above one in the database, so it's probably not a mail spool. What's the best cure? Can I just wipe out the block with something like: # dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1 Yeah, try that. I think the net effect will be that some wide (toasted) values will be truncated. Thank you for you guidance. I've done that and hope the running pg_dump goes through fine. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] invalid page header
Hi, I'm in the unfortunate position of having "invalid page header(s) in block 58591 of relation "pg_toast_302599". I'm well aware that the hardware in question isn't the most reliable one. None the less, I'd like to restore as much of the data as possible. A pg_filedump analysis of the file in question spits out the following information (relation pg_toast_302599 has oid 302602): # pg_filedump -R 58590 58592 base/296788/302602 *** * PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1 * * File: base/296788/302602 * Options used: -R 58590 58592 * * Dump created on: Fri Aug 3 10:34:23 2007 *** Block 58590 - Block Offset: 0x1c9bc000 Offsets: Lower 36 (0x0024) Block: Size 8192 Version3Upper 48 (0x0030) LSN: logid 4 recoff 0x24b6ee88 Special 8192 (0x2000) Items:4 Free Space: 12 Length (including item array): 40 -- Item 1 -- Length: 2034 Offset: 6156 (0x180c) Flags: USED Item 2 -- Length: 2034 Offset: 4120 (0x1018) Flags: USED Item 3 -- Length: 2034 Offset: 2084 (0x0824) Flags: USED Item 4 -- Length: 2034 Offset: 48 (0x0030) Flags: USED Block 58591 - Block Offset: 0x1c9be000 Offsets: Lower12858 (0x323a) Block: Size 28160 Version 73Upper14900 (0x3a34) LSN: logid 627535472 recoff 0x3a693b22 Special 9506 (0x2522) Items: 3209 Free Space: 2042 Length (including item array): 8192 Error: Invalid header information. Error: End of block encountered within the header. Bytes read: 8192. Block 58592 - Block Offset: 0x1c9c Offsets: Lower 36 (0x0024) Block: Size 8192 Version3Upper 48 (0x0030) LSN: logid 4 recoff 0x24b73110 Special 8192 (0x2000) Items:4 Free Space: 12 Length (including item array): 40 -- Item 1 -- Length: 2034 Offset: 6156 (0x180c) Flags: USED Item 2 -- Length: 2034 Offset: 4120 (0x1018) Flags: USED Item 3 -- Length: 2034 Offset: 2084 (0x0824) Flags: USED Item 4 -- Length: 2034 Offset: 48 (0x0030) Flags: USED *** End of Requested Range Encountered. Last Block Read: 58592 *** It seems obvious, that the block in question is absolutely screwed, as the block size as well as the version don't match. What's the best cure? Can I just wipe out the block with something like: # dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1 This is PostgreSQL 8.1.9 (from Debian 8.1.9-0etch1), but it has been running earlier 8.1.x versions in the past. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Pg schema diff tools Was: [ANNOUNCE] == PostgreSQL Weekly News - April 08 2007 ==
Hi, thanks for the links. I've had a quick look at the first two and comment my findings: Robert Treat wrote: Theres this one which uses tcl: https://sourceforge.net/projects/pgdiff Seems outdated: 2002, PostgreSQL 7.2, ~1500 lines of code. (which I don't really understand, I simply don't know TCL, sorry.) And this one which is written in perl: http://gborg.postgresql.org/project/pgdiff/projdisplay.php Yeah, perl is better with me ;-) Anyway, 2003 also isn't that up-to-date either. ~770 lines of code, but that might very well be sufficient in many cases. And as David Fetter pointed out, it's dangerous to rely on such tools, as they can never replace a change control process. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Pg schema diff tools Was: [ANNOUNCE] == PostgreSQL Weekly News - April 08 2007 ==
Hi, David Fetter wrote in the weekly news: Another PostgreSQL Diff Tool 1.0.0_beta20 released. http://pgfoundry.org/projects/apgdiff/ Why is it 'another' one? What others exist? (Specifically, are there ones, which don't depend on java?) Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] SQL WHERE: many sql or large IN()
Hi, tom wrote: Initially it seems that the WHERE IN (...) approach takes a turn for the worse when the list gets very large. What version do you use? PostgreSQL 8.2 had great improvements for that specific issue. Did you try EXPLAIN? Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] real multi-master replication?
Hi, Devrim GÜNDÜZ wrote: Yes, AFAIK, their solution is two phase commit based, like Sequoia. I thought it was PGCluster. At least this is what I understood from the drawings. Uhm, you're right, it looks very similar to PgCluster, not Sequoia. So it's not two phase commit based, right? Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] real multi-master replication?
Hi, hubert depesz lubaczewski wrote: i contacted the company some time ago, and the information i got was that their product is based on query-replication. Yes, AFAIK, their solution is two phase commit based, like Sequoia. Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] tsearch2: word position
Hi, Mike Rylander wrote: No, the first X aren't more important, but being able to determine word proximity is very important for partial phrase matching and ranking. The closer the words, the "better" the match, all else being equal. Ah, yeah, for word-pairs, that certainly helps. Thanks. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] tsearch2: word position
Hello Teodor, Teodor Sigaev wrote: byte offset of word is useless for ranking purpose Why is a word number more meaningful for ranking? Are the first 100 words more important than the rest? That seems as ambiguous as saying the first 1000 bytes are more important, no? Or does the ranking work with the word numbers internally to do something more clever? Do you understand why I find the word number inconvenient? Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] tsearch2: word position
Hi, Teodor Sigaev wrote: Word number is used only in ranking functions. If you don't need a ranking than you could safely strip positional information. Huh? I explicitly *want* positional information. But I find the word number to be less useful than a character number or a simple (byte) pointer to the position of the word in the string. Given only the word number, I have to go and parse the string again. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] tsearch2: word position
Hi, Teodor Sigaev wrote: I'm fiddling with to_tsvector() and parse() from tsearch2, trying to get the word position from those functions. I'd like to use the tsearch2 parser and stemmer, but I need to know the exact position of the word as well as the original, unstemmed word. It's not supposed usage... Why do you need that? Counter question: what's the supposed usage of the word number? Why would anyone be interested in that? You always need to parse the text yourself, to be able to get any use from the word number. to_tsvector() could as well return the character number or a byte pointer, I could see advantages for both. But the word number makes little sense to me. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] tsearch2: word position
Hello Teodor, Teodor Sigaev wrote: It's not supposed usage... Why do you need that? Well, long story... I'm still using my own indexing on top of the tsearch2 parsers and stemming. However, two obvious cases come to mind: - autocompletion, where I want to give the user one of the possible known words. Currently, I'm returning the stemmed word, which is obviously not quite right. - highlighting of matching words Have a look to headline framework as an example or staring point. hlparsetext() returns parsed text with matched lexemes in tsquery. Small description of hlparsetext is placed at http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html near the end. Description of HLWORD struct is some out of day, sorry. Thanks. I probably need to dig in the sources, though. Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] tsearch2: word position
Hi, I'm fiddling with to_tsvector() and parse() from tsearch2, trying to get the word position from those functions. I'd like to use the tsearch2 parser and stemmer, but I need to know the exact position of the word as well as the original, unstemmed word. What I came up with so far is pretty ugly: SELECT (parse('my test text')).tokid, (parse('my test text')).token, strip(to_tsvector((parse('my test text')).token)); And this only tells me a word position, not a character or byte position within the string. Is there a way to get this information from tsearch2? Regards Markus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Slow running to_tsvector (tsearch2 in PG 8.2.3)
Hi, Henrik Zagerholm wrote: Which takes about 80 seconds to complete. The hardware is a Pentium 4 2.8GHz with 1GB HyperX memory. Is this normal? What can I tweak in postgresql.conf to speed up big to_tsvector()? Hm.. seems not too unreasonable to me. Take a look at the stemmers or dictionaries involved. What do you use there? Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] sequence skips 30 values, how?
Hi, Geoffrey wrote: We are trying to track down an issue with our PostgreSQL application. We are running PostgreSQL 7.4.13 on Red Hat Enterprise ES 3. We have a situation where the postgres backend process drops core and dies. Are there some log messages of the dying process, especially just before it dies? Did you try to open the core file in a debugger? What do you do in that backend to make it crash? Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] how to read bytea field
Hi, marcelo Cortez wrote: Are you sure you tested with a real bytea field? Yeah , i store bytea using encode function , how you say . I never said 'use encode function to store bytea'. I tried to explain that encode returns TEXT. The field of my table is bytea type , and store real bytea data in this field. That's contradictory to the above. Please show me exactly what you do, please (the INSERT as well as the SELECT you want to use). I think my mistake was use bytea field. I thinking in turn this field to text and use decode/encode for storage binary data. That sounds like wasting diskspace and I/O bandwith. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] how to read bytea field
Hi, marcelo Cortez wrote: Yes i know, but if your define bytea field and store bytea in this field , decode don't work, Sure it does: test=# select encode(E'\\000\\001', 'escape')::text; encode -- \000\x01 (1 row) If you inspect the function, you'll find that encode can *only* handle bytea, not text (as the first parameter): test=# \df encode; List of functions Schema | Name | Result data type | Argument data types ++--+- pg_catalog | encode | text | bytea, text (1 row) And trying to feed it text gives: test=# select encode('some text'::text, 'escape')::text; ERROR: function encode(text, "unknown") does not exist Are you sure you tested with a real bytea field? Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Is there an equivalent of the W3c HTML checker for
Hi, I've just stumbled across the Mimer SQL Validator (commercial product): http://developer.mimer.com/validator/ Not that I know it... Anyway, there are different things (like PHP scripts or stored procedures and such), which do a whole lot of other logic and/or processing which influences the queries they finally make. So I think such SQL checkers are not that useful. Might be just MHO, though. Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Geographical redundancy
Hello Dennis, Dennis wrote: Is there any feasible way to achieve geographical redundancy of postgresql database? As nobody mentioned it up until now: please check the very nice documentation about High Availability and Failover here: http://www.postgresql.org/docs/8.2/static/high-availability.html It explains most optionsand its pros and cons. Regards Markus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Large Object to Bytea Conversion
Hi, I've sort of solved the problem for me. I'm now doing one single lo_read() to fetch the bytea field. Those functions do not operate on the large object OID, but one needs to open them first with lo_open(). I'm doing another hack to get the size of the large object. All combined in a sql function (plpgsql is not installed...) looks like that: -- a helper function to convert large objects to bytea more efficiently CREATE FUNCTION lo_readall(oid) RETURNS bytea AS $_$ SELECT loread(q3.fd, q3.filesize + q3.must_exec) FROM (SELECT q2.fd, q2.filesize, lo_lseek(q2.fd, 0, 0) AS must_exec FROM (SELECT q1.fd, lo_lseek(q1.fd, 0, 2) AS filesize FROM (SELECT lo_open($1, 262144) AS fd) AS q1) AS q2) AS q3 $_$ LANGUAGE sql STRICT; Does anybody know a better way? Why isn't a simple function like that included? And is it just me or is the documentation lacking to describe all the server side lo_* function? I've been studying the source to find those. Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] MySQL drops support for most distributions
Hi, John D. Burger wrote: Sure, but they won't use PG either, for essentially the same reason, since =all= PG support is "third party". Maybe. But at least these third parties can take the source and build their own product on top of it, without significant limitations. So one can debate if i.e. EnterpriseDB is providing third party support for PostgreSQL or first-hand support for their own product :-) Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Large Object to Bytea Conversion
Hi, I'm using the function below to convert a large object to a bytea value. Unfortunately, it performs very poorly for *large* objects, i.e. I'm currently fiddling with a file of about 100 MB. I've increased work_mem to 256 MB, but that didn't help much. I suspect the aggregate function is not as efficient as it looks. Is it probably not releasing memory? The postgres process consumes all the work_mem I've allowed. Theoretically, this could be a streaming operation and would not need much memory at all. As bytea is normally a varlena field, I suspect postgres needs to be able to hold the file at least once in memory. Any idea on how to speed this up? Regards Markus Dimitri Fontaine wrote: Hi, Le mardi 14 novembre 2006 14:36, Markus Schiltknecht a écrit : I want to convert some large objects to bytea fields on the server. Searching through the documentation didn't reveal any hints. Am I missing something or is there really no such thing as a lo_convert_to_bytea function? You may want to try this code given on IRC by i-can't-remember-who: CREATE FUNCTION _phpads_agg_concat (bytea, bytea) RETURNS bytea AS $_$SELECT $1 || $2$_$ LANGUAGE sql IMMUTABLE STRICT; CREATE AGGREGATE phpads_agg_concat (bytea) (SFUNC = _phpads_agg_concat, STYPE = bytea, INITCOND = ''); EXPLAIN ANALYZE SELECT phpads_agg_concat(data) FROM (SELECT data FROM pg_largeobject WHERE loid = 24513361 ORDEY BY pageno) t; QUERY PLAN --- Aggregate (cost=10.62..10.63 rows=1 width=32) (actual time=4.682..4.685 rows=1 loops=1) -> Index Scan using pg_largeobject_loid_pn_index on pg_largeobject (cost=0.00..10.57 rows=4 width=1863) (actual time=0.075..0.162 rows=4 loops=1) Index Cond: (loid = 24513361::oid) Total runtime: 5.146 ms CREATE FUNCTION phppgads_lo_readall(oid) RETURNS bytea AS $_$SELECT phpads_agg_concat(data) FROM (SELECT data FROM pg_largeobject WHERE loid = $1 ORDEY BY pageno) t$_$ LANGUAGE sql STRICT; Hope this will help, ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] [PERFORM] Locking in PostgreSQL?
Hi, Dave Cramer wrote: Apparently I've completely misunderstood MVCC then Probably not. You are both somewhat right. Jens Schipkowski wrote: >> Thats not right. UPDATE will force a RowExclusiveLock to rows >> matching the WHERE clause, or all if no one is specified. That almost right, RowExclusiveLock is a table level lock. An UPDATE acquires that, yes. Additionally there are row-level locks, which is what you're speaking about. An UPDATE gets an exclusive row-level lock on rows it updates. Please note however, that these row-level locks only block concurrent writers, not readers (MVCC lets the readers see the old, unmodified row). My understanding is that unless you do a select ... for update then update the rows will not be locked. Also almost right, depending on what you mean by 'locked'. A plain SELECT acquires an ACCESS SHARE lock on the table, but no row-level locks. Only a SELECT ... FOR UPDATE does row-level locking (shared ones here...) The very fine documentation covers that in [1]. Regards Markus [1]: PostgreSQL Documentation, Explicit Locking: http://www.postgresql.org/docs/8.2/interactive/explicit-locking.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Locking in PostgreSQL?
Hi, Joost Kraaijeveld wrote: Does PostgreSQL lock the entire row in a table if I update only 1 column? Yes. In PostgreSQL, an update is much like a delete + insert. A concurrent transaction will still see the old row. Thus the lock only prevents other writing transactions, not readers. Regards Markus P.S.: please do not cross post such questions. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Speed of postgres compared to ms sql, is this
Hi, Tomi N/A wrote: > When the subselect returns a lot of results, pgsql really takes it's time. 8.1.something PostgreSQL 8.2 improved a lot for IN clauses with lots of values. I think it now performs as good as an equal join query. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Large Object to Bytea Conversion
Hello Dimitri, Cool, thank you. Even though.. it looks a little... ehm... well optimized? Or shall I say hard to read? However, it should work for what I want. Shouldn't such a thing be at least in contrib? How about the reverse, filling a large object with a bytea value? Admittedly, this is less attractive ;-) Regards Markus Dimitri Fontaine wrote: Hi, Le mardi 14 novembre 2006 14:36, Markus Schiltknecht a écrit : I want to convert some large objects to bytea fields on the server. Searching through the documentation didn't reveal any hints. Am I missing something or is there really no such thing as a lo_convert_to_bytea function? You may want to try this code given on IRC by i-can't-remember-who: CREATE FUNCTION _phpads_agg_concat (bytea, bytea) RETURNS bytea AS $_$SELECT $1 || $2$_$ LANGUAGE sql IMMUTABLE STRICT; CREATE AGGREGATE phpads_agg_concat (bytea) (SFUNC = _phpads_agg_concat, STYPE = bytea, INITCOND = ''); EXPLAIN ANALYZE SELECT phpads_agg_concat(data) FROM (SELECT data FROM pg_largeobject WHERE loid = 24513361 ORDEY BY pageno) t; QUERY PLAN --- Aggregate (cost=10.62..10.63 rows=1 width=32) (actual time=4.682..4.685 rows=1 loops=1) -> Index Scan using pg_largeobject_loid_pn_index on pg_largeobject (cost=0.00..10.57 rows=4 width=1863) (actual time=0.075..0.162 rows=4 loops=1) Index Cond: (loid = 24513361::oid) Total runtime: 5.146 ms CREATE FUNCTION phppgads_lo_readall(oid) RETURNS bytea AS $_$SELECT phpads_agg_concat(data) FROM (SELECT data FROM pg_largeobject WHERE loid = $1 ORDEY BY pageno) t$_$ LANGUAGE sql STRICT; Hope this will help, ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Large Object to Bytea Conversion
Hi, I want to convert some large objects to bytea fields on the server. Searching through the documentation didn't reveal any hints. Am I missing something or is there really no such thing as a lo_convert_to_bytea function? Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] SSL Certificate Check
Hi, I'm trying to install a SSL certificate. psql correctly shows: SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) how can I check what certificate it has sent? Or what key it uses? Thank you Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] lots of values for IN() clause
Hi, Richard Huxton wrote: If you can reliably reproduce it (I can't here - Debian on x86) - a bug-report on the bugs mailing list or the website would probably be appreciated by the developers. PG version, OS version, method of install etc. I've thought about that, but I somehow just *knew* it was my fault and not a bug in PostgreSQL ;-) OTOH, having to configure such things is not exactly user friendly. I guess it's difficult to determine the stack limit in a cross-platform way. Or does having that configuration option other reasons for existence? Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] lots of values for IN() clause
Hello Alvaro, yeah, thanks, that's it. postgresql.conf had: max_stack_depth = 8192 # min 100, size in KB I don't know who put it at 8192. According to the fine manual at [1], it should be set to something below 'ulimit -s', which gives 8192 on the machine in question. I've now set it to 7000 and I also get a warning instead of a SEGFAULT. Thank you! Markus [1]: http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html Alvaro Herrera wrote: Markus Schiltknecht wrote: Hi, One of our PostgreSQL 8.1.5 databases constantly crashed on a certain query (backend SEGFAULTs). I've figured the crashes were caused by a very long IN() clause. You can easily reproduce the crash by feeding the output of the python script below to your database. I'd argue that you have max_stack_depth set to an invalid value (higher than your true stack limit). I tried your example here on 8.1.5 and got this: alvherre=# \i foo CREATE TABLE psql:foo:2: ERROR: stack depth limit exceeded HINT: Increase the configuration parameter "max_stack_depth". DROP TABLE ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] lots of values for IN() clause
Hi, thanks for testing, unfortunately I don't have a 8.0 around. And as 8.2 works and is probably coming very soon... Regards Markus Shelby Cain wrote: I don't get a segfault on 8.0.8 under linux or 8.1.4 under win32. The backend (correctly I assume) issues a hint to increase max_stack_depth in both cases. $ psql -h localhost -p 5432 -d test -U readwrite < script.sql Password: CREATE TABLE ERROR: stack depth limit exceeded HINT: Increase the configuration parameter "max_stack_depth". DROP TABLE Regards, Shelby Cain ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] lots of values for IN() clause
Hi, One of our PostgreSQL 8.1.5 databases constantly crashed on a certain query (backend SEGFAULTs). I've figured the crashes were caused by a very long IN() clause. You can easily reproduce the crash by feeding the output of the python script below to your database. Fortunately, 8.2 (as of 09/19/2006) no longer crashes. Anyway, I think it would be helpful to have at least a better error message instead of a SEGFAULT for 8.1. I didn't test earlier versions. Regards Markus python script mentioned: #!/usr/bin/python values = 10 print "CREATE TABLE test (a INT, b INT);" print "SELECT a, b FROM test WHERE b IN (" + \ ",".join([str(x) for x in range(values)]) + \ ");" print "DROP TABLE test;" ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] ERRORDATA_STACK_SIZE exceeded
Hallo Stefan, Stefan Sassenberg wrote: Hello, I've got a failing sql-Script that I execute with the psql command. The Script contains: I've been unable to reproduce the error with just that snippet (on debian with PostgreSQL 8.1.4). Can you provide a stripped down test case? 8<- PANIK: ERRORDATA_STACK_SIZE exceeded server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 8<- 'Exceeding stack size' sound somewhat like an infinite loop of a recursive function... but that's just a very general guess. I think the root cause of problems is somewhere else. Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Intentionally produce Errors
Hello Matthias, [EMAIL PROTECTED] wrote: In PL/pgSQL you could use the RAISE command: http://www.postgresql.org/docs/8.1/interactive/plpgsql-errors-and-messages.h tml Thank you, good to know. Unfortunately I'm not in a PL/PgSQL function, just a plain query. Some standard functions which invoke RAISE? Regards Markus BTW: why did you add '.h' to the link? Did you mess around with C headers a little too much recently :-) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Intentionally produce Errors
Hi, this is sort of a silly question, but: what's the proper way to intentionally generate an error? I'm writing tests for pyPgSQL and want to check it's error handling. Currently, I'm using: SELECT "THIS PRODUCES AN SQL ERROR"; Is there any better way to generate errors? Probably even generating specific errors given? Regards Markus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] UNIQUE constraints on function results
Emanuele Rocca wrote: you'll get a duplicate key error. Thank you, that solves my problem. Although it makes me wonder even more why I'm not allowed to define such a constraint. Looks like all the necessary backend code is there. Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] UNIQUE constraints on function results
Hi, I've been trying to add a unique constraint on a row and a function result of a row. I.e.: CREATE TABLE test ( id SERIAL PRIMARY KEY, t1 TEXT NOT NULL, t2 TEXT NOT NULL, UNIQUE (t1, lower(t2))); That fails with a syntax error (on 8.2beta1). While UNIQUE(t1, t2) works like a charm, it's not exactly what I want. I can easily create an index for my needs [1], why can I not add such a unique constraint? Thanks for clarification. Regards Markus [1]: CREATE INDEX test_idx ON test(t1, lower(t2)); ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Postgresql mentioned on Newsforge MySQL article
Scott Marlowe wrote: Was this all the same basic task implemented by different teams then? Yep. Can we see the code? hack it? I'm sure someone here could help out. Sure. I don't care about the contest, but it would be nice to be able to put out a version that could compete with MySQL's. Sure. The main db-contest site of the magazine is [1], but it's all german. You'lll find an english translation of the original article, describing the problem to solve in [2]. The contest entries are downloadable on [3]. Mine is the 'bluegap' one. If you want to hack together a high-performance solution based on PostgreSQL I'd start with Alvar C.H. Freude's disqualified entry. Happy hacking Markus [1]: http://www.heise.de/ct/dbcontest/ [2]: http://firebird.sourceforge.net/connect/ct-dbContest.html [3]: http://www.heise.de/ct/dbcontest/teilnehmer.shtml ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql mentioned on Newsforge MySQL article
Tony Caduto wrote: http://newsvac.newsforge.com/newsvac/06/08/28/1738259.shtml Don't know the validity of this dvd order test they did, but the article claims Postgresql only did 120 OPM. Seems a little fishy to me. Now, this article really s**ks! First of all, the original contest was specifically not only about performance. And the MySQL team did a whole lot of dirty tricks (i.e. using memcached) to push their solution. I am the one who has written he only PostgreSQL entry, for which I'm still sorry and ashamed, because it performs so poorly. I just didn't have much spare time to spend, but thought I'd send it in anyway. One of the reasons it did not perform well was, that I simply have forgotten to enable connection pooling. Another entry using Perl and PostgreSQL from somebody who put a lot more time and effort into it was disqualified because the benchmarker claimed the files came in to late. While the author of the PostgreSQL solution claims the wrong ZIP archive has been used. Besides, was quite a simple database test, not too complicated queries, very few stored procedures. No views, no triggers, no 2PC, no nothing. So IMHO it was not even a good test of database extensibility. So this comparison is really just... ah! forget about it! (Just please, next time, let's put a little more effort into such a thing.) Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Database Oid from SPI
Hi, thank you both. I first tried that, but the segfault really irritated me. It's now working fine with miscadmin.h. Sorry for the noise. Regards Markus Tom Lane wrote: Actually I'd recommend you use the global MyDatabaseId from "miscadmin.h". It'll be the same value, but it's always best to avoid unnecessary accesses to shared memory. regards, tom lane Alvaro Herrera wrote: > I'd use MyDatabaseId ... > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Database Oid from SPI
Whoops, sorry, there was another reason for the segfault. Using MyProc->databaseId works. Is it the right way to do it, though? Markus Schiltknecht wrote: Hi, how can I get the database name or OID of the current backend in a SPI function (in plain C)? I tried including storage/proc.h and accessing MyProc->databaseId, but that leads to a segfault :-( (and seems like the wrong way to do it.) The SPI documentation didn't help. Thank you Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Database Oid from SPI
Hi, how can I get the database name or OID of the current backend in a SPI function (in plain C)? I tried including storage/proc.h and accessing MyProc->databaseId, but that leads to a segfault :-( (and seems like the wrong way to do it.) The SPI documentation didn't help. Thank you Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] unknown error message
On Mon, 2006-07-24 at 14:54 -0400, Tom Lane wrote: > Right offhand the only way that I could see for the tuple to disappear > before the trigger fires is if a concurrent VACUUM removed it, which > should not happen for a tuple inserted by a still-active transaction. > If you've got autovacuum running in the background, the unpredictability > of vacuum timing might be contributing to difficulty of reproducing. Sorry, I was unclear: luckily the error is easily reproducible. The problem is just to cut unnecessary parts and to come up with a small test case. I'll give it a try tomorrow. As far as VACUUM is concerned: I run this data-loading script just after CREATE DATABASE. In my tests, not more than some 100 rows (rough guess) are inserted until the error occurs. It would surprise me if that would have fired an autovacuum every time I tried. Plus as you say, it all happens in the same transaction. (Oh, and I even tried with a sleep(5) before the commit and the error still occurred. I guess an autovacuum most probably won't take that long with so few rows.) Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] unknown error message
Hello Tom, in the mean time I've traced the problem down to a 'CONSTRAINT' declared as 'DEFERRABLE INITIALLY DEFERED'. As soon as I remove that constraint the transaction commits nicely. BTW: I'm running in ISOLATION LEVEL SERIALIZABLE, if that matters. Has something changed so that it's worth trying current CVS? I'll try to come up with a test case, the problem is not easy to isolate, though. Regards Markus Tom Lane wrote: Markus Schiltknecht <[EMAIL PROTECTED]> writes: I'm getting the following error from my python script, which tries to insert lots of data in one transaction: libpq.OperationalError: ERROR: failed to fetch new tuple for AFTER trigger That's not supposed to happen. Can you provide a test case? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] unknown error message
Hi, I'm getting the following error from my python script, which tries to insert lots of data in one transaction: libpq.OperationalError: ERROR: failed to fetch new tuple for AFTER trigger I have several AFTER triggers in place, which one raises this error? I'm sure I only INSERT data, no UPDATE nor DELETE, so for what reason could a trigger fail to fetch a new tuple? Can deferred constraints raise such errors? Because AFAICT no triggers should be invoked by the very SQL command which causes the error... strange enough! Any hints on where to look for the bug? I'm on a PostgreSQL 8.2devel snapshop from 06/06/2006 (silly date :-) using PL/Python for triggers. I'm unable to paste the schema - plus it's quite large and - as said - I have no clue what part of it raises the error. Thanks for you help. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Problem with 'postgres' db with 8.0 on a MacBook
Hi Jonathon, Jonathon McKitrick wrote: : # psql template1 -c "CREATE DATABASE test;" CREATE DATABASE is the result. Looks good. Can you connect to that database then? : What does psql -l say? FATAL: database 'postgres' does not exist As Tom said: check if you are really calling you self-compiled binaries: # which pgsql Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Problem with 'postgres' db with 8.0 on a MacBook
Hi Jonathon, does the following command succeed? # psql template1 -c "CREATE DATABASE test;" The database 'postgres' is a system table which has been added in 8.2 (or 8.1 already, dunno). It should exist if you used the correct initdb and postmaster. What does psql -l say? And did you recheck permissions in pg_hba.conf? BTW: nice laptop, isn't it? Linux still needs some work to run it, tough. But I'm getting there ;-) Regards Markus Jonathon McKitrick wrote: Hi all, this is my first time posting here because I've been running pgsql without problem on a couple of Linux boxes for a while now. I'm migrating to a new MacBook, and while I finally got it to compile, initdb, and start a postmaster, any time I try createdb or createuser, I get a message that the database 'postgres' cannot be found. I installed after a manual build, chown'ed /usr/local/pgsql to postgres, added a 'data' directory, and called initdb on it. But all calls to createdb or createuser gave the message above. Can anyone help? Jonathon McKitrick -- My other computer is your Windows box. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Best high availability solution ?
Hi Arnaud, perhaps you can still use Slony-I for replication and have another tool automatically handle connections (check out PgPool[1] or SQLRelay[2]). Or go for a middleware replication solution. Check C-JDBC[3], perhaps there is something similar for ODBC? LifeKeeper seems to handle replication at a lower level (filesystem, distributed memory, or such) and does not seem to be very well suited for database replication. However, I've had just a quick glance at the website. Hope that helps Markus [1]: http://pgfoundry.org/projects/pgpool/ [2]: http://sqlrelay.sourceforge.net/ [3]: http://c-jdbc.objectweb.org/ On Wed, 2006-05-31 at 09:36 +0200, Arnaud Lesauvage wrote: > Hi list ! > > I have a small enterprise network (~15 workstations, 1 server), > all running windows OSes. Most of our work is done on a PostgreSQL > DB (on the windows server). > I am the only IT here, and my boss asked me to find a way to have > the database always online, without my intervention. > Last time I went on vacation, the server crashed and no one was > able to repair it. > > Our application connects to PostgreSQL through ODBC, with a simple > TCP/IP connection. > I though that I would first install a Slony-I cluster. That would > be fine for data replication, but still if the main server > crashes, the database connections will not work anymore because > the name of the backup-server will be different than the name of > the master, so all ODBC connection should be changed to use the > new machine name. > Since I cannot ask anyone to do some DNS changes or things like > that, I am looking for a simple way to have my database always > online (note that I already have a UPS and RAID1 on the server to > prevent most failures). > > After some searches, I found LifeKeeper, which looks very good but > is quite expensive ! > Are there easier and/or better solutions than that ? > > Thanks for your advices on this matter ! > > -- > Arnaud > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] rules: evaluate inputs in advance
Hi Martijn, On Fri, 2006-05-12 at 18:05 +0200, Martijn van Oosterhout wrote: > But it can't really. In the example that started this thread, there are > two seperate rules and after rewriting the executor will be presented > two seperate queries. Ah, thank you, that explains the difficulties with rules. > What you probably want is a function that is given the row and then > executes the two statements on a per row basis. This has the effect you > want but gives up the major benefit of rules, wholesale query > restructuring like views which allows the executor to find better > plans. The executor can't see inside a trigger so it can't optimise. Isn't that an argument for keeping rewrite rules instead of using something trigger like for updatable views? Wouldn't it be feasible to teach the executor how to handle multiple queries with some pre-evaluated input? Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] rules: evaluate inputs in advance
On Fri, 2006-05-12 at 15:57 +0200, Martijn van Oosterhout wrote: > It's a known problem. It's also one of the reasons why triggers are > recommended over rules. And it's not desirable behaviour. Well, triggers cannot be used to create writeable views, can they? > There have been discussions about the problems Do you have some pointers here? I did not find relevant discussions in the archives (of gmane.org) > Yet no-one has come up with an > alternative that covers the current uses... As far as I know, the rewriter simply copies plan tree nodes. I thought about inserting an 'evaluate-only-once' node which is linked from all rules and points to the expression to evaluate (instead of copying the complete explession and evaluating it several times in the executor). Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] rules: evaluate inputs in advance
Hi, I was trying to create an updateable view. Suddenly I got foreign key violations when using nextval('myseq'). As I understand, the rewriter does something similar to a simple text replacement (I guess copying the plan tree nodes?) so that nextval gets evaluated again for every rule that applies. Is this desirable? Are there applications which need every rule to reevaluate the input expressions? Or could that behaviour be changed so that the input values for the rules get evaluated only once? A simple example: test=# CREATE TABLE test (id SERIAL PRIMARY KEY); test=# CREATE VIEW view AS SELECT id FROM test; test=# CREATE RULE rule1 AS ON INSERT TO view DO INSTEAD \ INSERT INTO test (id) VALUES (NEW.id); test=# CREATE RULE rule2 AS ON INSERT TO view DO ALSO \ DELETE FROM test WHERE id = NEW.id; test=# INSERT INTO view (id) VALUES (1); test=# SELECT id FROM test; id (0 rows)-- that's what I was expecting... test=# INSERT INTO view (id) VALUES (nextval('test_id_seq')); test=# SELECT id FROM test; id 1 (1 row)-- this happens because rule2 evaluated nextval(..) again test=# SELECT currval('test_id_seq'); currval - 2 (1 row)-- confirming my observation. Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] convert LO to BYTEA
Hi, is there an easy way to convert a large object to a bytea field? Thanks Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Unnecessary function calls
On Tue, 2006-05-02 at 14:02 +0200, Martijn van Oosterhout wrote: > How about: > > SELECT id, get_category_text_path(id) > FROM (SELECT id FROM category > ORDER BY rank > LIMIT 5) as x; Oh that works? Great! Let me see, with 'rank' from a joined table that looks like: SELECT id, get_category_text_path(id), rank FROM ( SELECT c.id FROM category AS c JOIN rank_lookup AS r ON r.cat_id = c.id ORDER BY r.rank LIMIT 5 ) as x; That works perfectly. It prevents a second join and thus solves the question in my previous mail. > Evidently you don't have an index on rank, otherwise it would've used > the index to cut down on the number of rows that needed to be examined. No, there is no index. I need to think about creating one... Thank you very much. Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Unnecessary function calls
Hello Terry, Thanks a lot. That's so simple I didn't see it. (The original query is much more complex.) The only problem is, rank is not a column of category itself, but a joined row. With this solution, the join will have to be performed twice. But since this doesn't cost that much and because the second join is only done for 5 rows at the max this does not hurt. The more complete query now looks a little ugly: SELECT id, get_category_text_path(id), r.rank FROM category JOIN rank_lookup AS r ON cat_id = id WHERE id IN ( SELECT c.id FROM category AS c JOIN rank_lookup AS rr ON rr.cat_id = c.id ORDER BY rr.rank LIMIT 5 ) It's not possible to optimize out that second join, is it? Regards Markus On Tue, 2006-05-02 at 07:39 -0400, Terry Fielder wrote: > SELECT id, get_category_text_path(id) > FROM category > WHERE id IN ( >SELECT c.id >FROM category AS c >ORDER BY c.rank >LIMIT 5 > ) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Unnecessary function calls
Hi, when using LIMIT, how do I tell the planner to only call a function for rows it returns? An example: I want to fetch the top five categories. A function get_category_text_path(cat_id int) returns the textual representation of the category. For that I do something like: SELECT id, get_category_text_path(id) FROM category ORDER BY rank LIMIT 5 Unfortunately this takes very long because it calls get_category_text_path() for all of the 450'000 categories in the table. But I only need the full text path of the top five rows. It does not matter if I declare the function to be IMMUTABLE, STABLE or VOLATILE - it gets called for every row in category (which normally is what you want I guess). How can I rewrite the query to call get_category_text_path() only for the top five rows? Thanks for hints. Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match