Re: [HACKERS] bugzilla (Was: What do you want me to do?)
Seriously, I have wondered if it might be a good idea to assemble a small hit team that would take some high profile open source projects and make sure they worked with Postgres. Bugzilla would be the most obvious candidate, but there are certainly others. I suspect that could be quite productive, though. Thoughts? Count me out - I spend way too much of my time working on phpPgAdmin as it is :) Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] bugzilla (Was: What do you want me to do?)
On Sun, Nov 09, 2003 at 04:12:50PM +0800, Christopher Kings-Lynne wrote: Seriously, I have wondered if it might be a good idea to assemble a small hit team that would take some high profile open source projects and make sure they worked with Postgres. Bugzilla would be the most obvious candidate, but there are certainly others. I suspect that could be quite productive, though. Count me out - I spend way too much of my time working on phpPgAdmin as it is :) Count me out too. I already ported MagicPoint to work well with Postgres, and I'm a little overwhelmed already to do anything else. err... so, how does MagicPoint use Postgres? I suppose it's only because some of us use it to give Postgres talks and such ;-) -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) On the other flipper, one wrong move and we're Fatal Exceptions (T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Coding help
The only idea I have is to call oidin() to do the conversion from string to oid. I see this in copy.c: loaded_oid = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(string))); if (loaded_oid == InvalidOid) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg(invalid OID in COPY data))); I would give that a try. Yes but in this case, the Node is parsed like this: | COMMENT ON LARGE OBJECT NumericOnly IS comment_text { ... n-objname = makeList1(makeAConst($5)); ... $$ = (Node *) n; } So there is no CString to convert to a Datum... Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Coding help
Christopher Kings-Lynne wrote: The only idea I have is to call oidin() to do the conversion from string to oid. I see this in copy.c: loaded_oid = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(string))); if (loaded_oid == InvalidOid) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg(invalid OID in COPY data))); I would give that a try. Yes but in this case, the Node is parsed like this: | COMMENT ON LARGE OBJECT NumericOnly IS comment_text { ... n-objname = makeList1(makeAConst($5)); ... $$ = (Node *) n; } So there is no CString to convert to a Datum... But I see makeAConst as: static Node * makeAConst(Value *v) { Node *n; switch (v-type) { case T_Float: n = makeFloatConst(v-val.str); break; case T_Integer: n = makeIntConst(v-val.ival); break; case T_String: default: n = makeStringConst(v-val.str, NULL); break; } return n; } which uses makeStringConst(). -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Coding help
Christopher Kings-Lynne [EMAIL PROTECTED] writes: | COMMENT ON LARGE OBJECT NumericOnly IS comment_text n-objname = makeList1(makeAConst($5)); Forget the makeAConst step --- it's just wasted cycles. In the execution routine, you can use code comparable to define.c's defGetInt64() to convert the Value node into a numeric OID, ie, either do intVal() or a call of oidin(). regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] New database access library, libpqxx-object
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I've been using libpqxx to access a Postgres database for the last few months, and I wrote a set of classes to wrap around libpqxx, which gradually evolved into a small set of classes and class templates. These allow database tables and rows of tables to be represented as real C++ objects, which you can interact with using normal class methods. Basically, it's an abstraction layer to separate the SQL SELECT, INSERT, UPDATE and DELETE commands from the user of the classes, so the classes become database aware: the user of the class doesn't need to be aware of the database interaction going on under the hood. Database relationships such as joins using foreign keys and table inheritance may be expressed using C++ inheritance. The table classes take a pqxx::Connection in their constructors and an optional pqxx::Transaction (they can run in either transaction or auto-commit modes). The current version of the library is available from: http://www.whinlatter.uklinux.net/libpqxx-object-0.1.0.tar.bz2 This includes an API reference and a short tutorial explaining how it is intended to be used. It's not yet completely perfect, but it does its job. I'm now looking at additional ways to wrap the functionality of libpqxx. Are there any similar libraries in existence (either for Postgres or any other DBMS)? (I'm looking to pinch some ideas ;-) If it would be of interest, you are welcome to include this in your libpqxx library, or as a separate project on GBorg (I'll relicense it under the standard Postgres licence). Regards, Roger - -- Roger Leigh Printing on GNU/Linux? http://gimp-print.sourceforge.net/ GPG Public Key: 0x25BFB848. Please sign and encrypt your mail. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/ iD8DBQE/qWhJVcFcaSW/uEgRAtbyAJ0WhzoNdZn/U4G1xiYsLE3kUoLvPwCfZdVn 7KRMpSqloWFzFnClDRZOsUI= =E0AU -END PGP SIGNATURE- ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [pgsql-www] [HACKERS] Changes to Contributor List
Hi all, well, the most of the programming itself is done, what's currently missing is 1) Some fine tuning on the layout 2) Adding the static pages 3) Translating the pages, news, events etc. into german and maybe some other languages if we find someone to translate it 4) Creating something that handles old links as the new ones are not compatible 5) Reformatting the PHP code as some parts currently look horrible ;) 6) All the things I have forgotten... It's currently not easy to work together on the code as it's on my Subversion server and nobody else uses Subversion ;) If someone wants to help me on this it would be good if we could get some CVS repository on gborg for this... Well, before doing that we maybe should decide if we combine www and advocacy (and maybe even developer?) into one big site... I would like that Idea, I think I will create a little sample page later to see how it could look like :) Mit freundlichen Grüßen Andreas Grabmüller - Original-Nachricht - Von: Dave Page [EMAIL PROTECTED] An: Peter Eisentraut [EMAIL PROTECTED], Robert Treat [EMAIL PROTECTED] CC: Alvaro Herrera [EMAIL PROTECTED], Josh Berkus [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Datum: Friday, November 07, 2003 09:34 AM Betreff: [pgsql-www] [HACKERS] Changes to Contributor List [...] How can we help? Andreas? Regards, Dave. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- LetzPlay.de | Freemail: http://www.letzplay.de/mail | Forenhosting: http://www.letzplay.de/foren ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] possible replace() bug - postgres 7.3.1
I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96 the following query fails: SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') || \'' || replace(data, '\'', '\'\'') || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10; with the error: ERROR: repalloc: invalid request size 1073741824 but, the following query (without replace): SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') || \'' || data || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10; works just fine, producing output like: UPDATE pa_sales_lead SET details = COALESCE(details, '') || 'EMPLOY: 50' WHERE foreign_sales_lead_id = '1004'; any ideas? -r On Thu, 2003-11-06 at 12:52, Carlos wrote: Hello Forum, Could someone please send me the link and the names of the files to install the version 7.4 in Windows 2000 server and XP Pro? I cannot discern the files that one is supposed to download.. Thanks in advance for your response. Carlos Oliva Senior Programmer/Analyst Positive Business Solutions, Inc. Cincinnati, OH 45240-1640 (513) 772 - 2255 ext 129 [EMAIL PROTECTED] Yahoo ID: ramboid_1997 -- Ryan Mahoney [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [pgsql-advocacy] Changes to Contributor List
I think it is part of the incentive for corporations to contribute - not just an impressive list for PHB. It's nice to get the recognition for their time/money contributions and a good way for the PGDG to show their appreciation. -r On Thu, 2003-11-06 at 14:34, Josh Berkus wrote: Peter, I was discussing specifically the Recognized Corporate Contributors which is, AFAIK, strictly a PHB thing, no? No. Please explain. -- Ryan Mahoney [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Erroneous PPC spinlock code
On Thu, Nov 06, 2003 at 12:08:56AM +0100, Reinhard Max wrote: On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: The SuSE PPC guru said that the PPC spinlock code we currently use may behave erroneously on multiprocessor systems. What's his evidence for that claim? Let's ask himself. The code we have is based directly on the recommendations in the PPC manuals, and has been tested on multi-CPU systems. Marcus, can you explain the details, please? I reviewed the documentation again (at: http://www-1.ibm.com/servers/esdd/articles/powerpc.html ) and it seems to agree with your opinion. I retract my comment, leave your code as-is. Ciao, Marcus pgp0.pgp Description: PGP signature
[HACKERS] Setting client encoding with jdbc
I'm interested in changing the client encoding from within a java program. I'm using connection pooling and need to change it depending on the client's language preference. Is there a function I can call which changes the client encoding. Something like pStmt = con.prepareStatement(select pg_set_client_encoding('LATIN5'); ); rs = pStmt.executeQuery(); I can set the client encoding from within psql by /encoding LATIN5 but not sure how to execute something like this from java. Any help would be greatly appreciated. Thanks, Brad ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Very poor estimates from planner
On Thu, 2003-11-06 at 10:35, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: - Hash Join (cost=3D1230.79..60581.82 rows=3D158 width=3D54)= (actual time=3D1262.35..151200.29 rows=3D1121988 loops=3D1) Hash Cond: (outer.account_id =3D inner.account_id) - Hash Join (cost=3D1226.78..52863.43 rows=3D1542558 w= idth=3D50) (actual time=3D1261.63..100418.30 rows=3D1573190 loops=3D1) (join of bsod, tsb, tss) (btw, would you mind turning off MIME encoding in your mails to the PG lists? It's a real PITA to quote.) I can, though I would ask which email client you use that doesn't pull content out of mime encoded emails. So yes, since this is a full table scan all values will be joined since the foreign key enforces them all to exist. Well, no, because only 1121988 rows come out of the join when 1573190 went in. So the actual selectivity of the join is about 70%. The question is why the planner is estimating the selectivity at 0.01% (158/1542558). Could we see the pg_stats rows for service.account_id and account.account_id? relname | attname | stanullfrac | stawidth | stadistinct | stakind1 | stakind2 | stakind3 | stakind4 | staop1 | staop2 | staop3 | staop4 | stanumbers1| stanumbers2 | stanumbers3 | stanumbers4 | stavalues1| stavalues2 | stavalues3 | stavalues4 -++-+--+-+--+--+--+--+++++--+-+-+-+-+++ service | account_id | 0 |4 | 10 |1 |3 |0 |0 | 96 | 97 | 0 | 0 | {0.388393,0.0825893,0.078125,0.0758929,0.0703125,0.0647321,0.0647321,0.0636161,0.0625,0.0491071} | {0.591672} | | | {1,8221,8223,8226,8222,8218,8220,8219,8224,8225}|| | account | account_id | 0 |4 | -1 |2 |3 |0 |0 | 97 | 97 | 0 | 0 | | {0.97034} | | | {1,10178,12136,14099,16054,18011,19966,21924,23881,26018,27995} || | (2 rows) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Manfred Spraul [EMAIL PROTECTED] writes: Greg Stark wrote: I'm assuming fsync syncs writes issued by other processes on the same file, which isn't necessarily true though. It was already pointed out that we can't rely on that assumption. So the NetBSD and Sun developers I checked with both asserted fsync does in fact guarantee this. And SUSv2 seems to back them up: At least Linux had one problem: fsync() syncs the inode to disk, but not the directory entry: if you rename a file, open it, write to it, fsync, and the computer crashes, then it's not guaranteed that the file rename is on the disk. I think only the old ext2 is affected, not the journaling filesystems. That's true. But why would postgres ever have to worry about files being renamed being synced? Tables and indexes don't get their files renamed typically. WAL logs? -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [pgsql-www] [HACKERS] Changes to Contributor List
emeritus is a perfectly good latin word. No need to dumb things down. --elein On Wed, Nov 05, 2003 at 05:26:29PM -0500, Bruce Momjian wrote: Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: B) What contributors are listed under Major Developers who haven't contributed any code since 7.1.0? I think we had agreed that formerly-listed contributors would not be deleted, but would be moved to a new section titled Contributors Emeritus or some such. Please make sure that Tom Lockhart and Vadim get listed that way, at least. I think the Emeritus word might be too hard for non-native English speakers, and even for less educated English speakers. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Performance features the 4th
Yes, I would like to see the vacuum delay patch go into 7.4.1 if possible. It's really useful. I don't think there is any major risk in adding the delay patch into a minor revision given the small amount of code change. Stephen Matthew T. O'Connor [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: As a matter of fact, people who have performance problems are likely to be the same who have upgrade problems. And as Gaetano pointed out correctly, we will see wildforms with one or the other feature applied. I'd believe that for patches of the size of my original VACUUM-delay hack (or even a production-grade version of same, which'd probably be 10x larger). The kind of wholesale rewrite you are currently proposing is much too large to consider folding back into 7.4.*, IMHO. Do people think that the VACUUM-delay patch by itself, would be usefully enough on it's own to consider working it into 7.4.1 or something? From the little feedback I have read on the VACUUM-delay patch used in isolation, it certainly does help. I would love to see it put into 7.4 somehow. The far more rigorous changes that Jan is working on, will be welcome improvements for 7.5. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen lauri.pie [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? Well, I'd view an order as an entity. As such, I would give it its own FILE, and your question doesn't make sense. But then your formula for disk head movements does not make sense either! Why not? The order is a real-world thing, and as such I would have an ORDERS file, in which each order is a single entry, with customer_id as one of its attributes. order detail is an attribute of order, so if I change customer_id it's the relational equivalent of just changing one cell in one row. The chances of me having to move the record is pretty near nil, and if I do it won't change bucket so at most it involves two frames (or disk blocks, if that's what you want to call them). But if the system did move the stuff, it would be four disk accesses - read/write to delete the old entry, read/write to save the new. As for enough room - well - it'll fall over if we have a disk full (or it might not). Not enough room here means not enought room in the block of the customer (from which you were supposed to get all data in one read, or disk head movement). That would mean that your order information would be moved perhaps to another block and result in an extra head movement, or am I right? Which I've taken in to account - if there isn't enough room in the original bucket, I need to either overflow into the next bucket which might exist, or to create it if it doesn't. Ie two head movements to delete from the first bucket, and two head movements to add to the second. And it will only fall over if I need to create a new bucket and there's no space left on the disk (or if (and this is very unlikely in this scenario) it triggers a split which again needs space and there's none left on disk). Or have you not sussed that we view order detail as an attribute of order (which is therefore stored as part of the same thing), but customer is separate from order, is stored separately, and is linked by a relationship. (Whereas order detail is NOT related to order, because they are part of the same thing :-) Well, it does result in data being stored multiple times ;-) What on earth is wrong with that? Do you know how much 160GB of disk cost's today? I could ask: does your system work in, say 4KB? That's how much memory the first computer I used (a Wang 2000) had. Probably it would not work at all. In the 50's they did amazing things with hardly any compilers and very little memory. I am referring to Whirlwind. See http://www.cedmagic.com/history/whirlwind-computer.html. Could you have done that with MV? My point? Why are we discussing restrictions to memory and CPU speed of the 70's and 80's? If an SQL DBMS uses more memory and disk, and it is available, why complain about *that*. Im not impying that you cannot complain about other matters, e.g. ease of development etc. and you might even be right. Be it as it is, I am not trying to make you abandon your MV database. As always, you're relying on hardware to help :-) You know what I think of that :-) And 160Gb of disk is only cheap if you're using IDE on a desktop PC - it costs a hell of a lot more for a laptop or SCSI for a server. And if it's embedded it maybe that the *room* is expensive, not the capacity ... And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? Nope. If I understand you correctly, you want attributes that belong to the entity customer, not the entity invoice. T = ST * 1.05. (By the way, billing and/or invoice address (for example) are invoice attributes, not company attributes.) No, I want you to give me a list of all your customers. How many disk reads? T = N * 1.05 where N is the number of customers. What do you want to know about those customers? Address? Phone number*s*? Anything else? That's *all* at no extra cost. Well, no thanks. I just wanted their names this time. The relational alternative, with an index on customer_name, would be again an order of magnitune less disk reads. Well, if you let me use an index here, I'm sorry, GAME OVER! The best you can do would be a photo finish. Assuming an overhead of, say, 4 bytes per index entry, the entire index would be Size = 4 * N + sigma(name_length) + sigma(key_length) Okay, I've probably got some padding there as well, but so will you. And note I
Re: [HACKERS] Erroneous PPC spinlock code
On Wed, 5 Nov 2003 at 13:28, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: The SuSE PPC guru said that the PPC spinlock code we currently use may behave erroneously on multiprocessor systems. What's his evidence for that claim? Let's ask himself. The code we have is based directly on the recommendations in the PPC manuals, and has been tested on multi-CPU systems. Marcus, can you explain the details, please? cu Reinhard ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: You want to find, open, and fsync() every file in the database cluster for every checkpoint? Sounds like a non-starter to me. Except a) this is outside any critical path, and b) only done every few minutes and c) the fsync calls on files with no dirty buffers ought to be cheap, at least as far as i/o. The directory search and opening of the files is in itself nontrivial overhead ... particularly on systems where open(2) isn't speedy, such as Solaris. I also disbelieve your assumption that fsync'ing a file that doesn't need it will be free. That depends entirely on what sort of indexes the OS keeps on its buffer cache. There are Unixen where fsync requires a scan through the entire buffer cache because there is no data structure that permits finding associated buffers any more efficiently than that. (IIRC, the HPUX system I'm typing this on is like that.) On those sorts of systems, we'd be way better off to use O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs. Check the archives --- this was all gone into in great detail when we were testing alternative methods for fsyncing the WAL files. So the NetBSD and Sun developers I checked with both asserted fsync does in fact guarantee this. And SUSv2 seems to back them up: The fsync() function can be used by an application to indicate that all data for the open file description named by fildes is to be transferred to the storage device associated with the file described by fildes in an implementation-dependent manner. The question here is what is meant by data for the open file description. If it said all data for the file referenced by the open FD then I would agree that the spec says what you claim. As is, I think it would be entirely within the spec for the OS to dump only buffers that had been dirtied through that particular FD. Notice that the last part of the sentence is careful to respect the distinction between the FD and the file; why isn't the first part? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Darwin Startup Script Patch
Clearly I can't spell. Regards, David --- postgresql-7.4RC1/contrib/start-scripts/PostgreSQL.darwin Mon Dec 9 13:26:07 2002 +++ /Users/david/Desktop/PostgreSQL.darwin Wed Nov 5 17:53:23 2003 @@ -18,9 +18,9 @@ # # POSTGRESQLSERVER=-YES- # -# The startup bundle will now be read to go. To prevent this script from +# The startup bundle will now be ready to go. To prevent this script from # starting PostgreSQL at system startup, simply change that line in -# /etc/hostconfig to: +# /etc/hostconfig back to: # # POSTGRESQLSERVER=-NO- # -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://www.kineticode.com/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED] Kineticode. Setting knowledge in motion.[sm] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] BTree index
I am trying to find information regarding creation of B-tree index in postgres for variable length character data (Char/varchar type). Specifically, what pagination policy is used, does it use prefix BTree, or any other form of compression (encoding)? Regards, VJ Anand ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UPPER()/LOWER() and UTF-8
TL == Tom Lane [EMAIL PROTECTED] writes: TL writes: upper/lower aren't TL going to work desirably in any multi-byte character set TL encoding. Can you please point me at their implementation? I do not understand why that's impossible. TL Because they use ctype.h's toupper() and tolower() TL functions, which only work on single-byte characters. Aha, that's in src/backend/utils/adt/formatting.c, right? Yes, I see, it goes byte by byte and uses toupper(). I believe we could look at the locale, and if it is UTF-8, then use (or copy) e.g. g_utf8_strup/strdown, right? http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup I belive that patch could be written in a matter of hours. TL There has been some discussion of using wctype.h where TL available, but this has a number of issues, notably figuring TL out the correct mapping from the server string encoding (eg TL UTF-8) to unpacked wide characters. At minimum we'd need to TL know which charset the locale setting is expecting, and there TL doesn't seem to be a portable way to find that out. TL IIRC, Peter thinks we must abandon use of libc's locale TL functionality altogether and write our own locale layer before TL we can really have all the locale-specific functionality we TL want. I believe that native Unicode strings (together with human language handling) should be introduced as (almost) separate data type (which have nothing to do with locale), but that's bluesky maybe. --alexm ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Greg Stark wrote: I'm assuming fsync syncs writes issued by other processes on the same file, which isn't necessarily true though. It was already pointed out that we can't rely on that assumption. So the NetBSD and Sun developers I checked with both asserted fsync does in fact guarantee this. And SUSv2 seems to back them up: At least Linux had one problem: fsync() syncs the inode to disk, but not the directory entry: if you rename a file, open it, write to it, fsync, and the computer crashes, then it's not guaranteed that the file rename is on the disk. I think only the old ext2 is affected, not the journaling filesystems. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
The delay patch worked so well, I couldn't resist asking if a similar patch could be added for COPY command (pg_dump). It's just an extension of the same idea. On a large DB, backups can take very long while consuming a lot of IO slowing down other select and write operations. We operate on a backup window during low traffic period at night. It'll be nice to be able to run pg_dump *anytime* and no longer need to worry about the backup window. Backups will take longer to run, but like in the case of the VACUUM, it's a win for many people to be able to let it run in the background through the whole day. The delay should be optional and defaults to zero so those who wish to backup immediately can still do it. The way I see it, routine backups and vacuums should be ubiquitous once properly configured. Regards, Stephen Tom Lane [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Jan Wieck [EMAIL PROTECTED] writes: I am currently looking at implementing ARC as a replacement strategy. I don't have anything that works yet, so I can't really tell what the result would be and it might turn out that we want both features. It's likely that we would. As someone (you?) already pointed out, VACUUM has bad side-effects both in terms of cache flushing and in terms of sheer I/O load. Those effects require different fixes AFAICS. One thing that bothers me here is that I don't see how adjusting our own buffer replacement strategy is going to do much of anything when we cannot control the kernel's buffer replacement strategy. To get any real traction we'd have to go back to the take over most of RAM for shared buffers approach, which we already know to have a bunch of severe disadvantages. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: How I can see the background writer operating is that he's keeping the buffers in the order of the LRU chain(s) clean, because those are the buffers that most likely get replaced soon. In my experimental ARC code it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and n2 dirty buffers (n1+n2 configurable), then fsync all files that have been involved in that, nap depending on where he got down the queues (to increase the write rate when running low on clean buffers), and do it all over again. You probably need one more knob here: how often to issue the fsyncs. I'm not convinced once per outer loop is a sufficient answer. Otherwise this is sounding pretty good. This is definitely heading into the right direction. I currently have a crude and ugly hacked system, that does checkpoints every minute but streches them out over the whole time. It writes out the dirty buffers in T1+T2 LRU order intermixed, streches out the flush over the whole checkpoint interval and does sync()+usleep() every 32 blocks (if it has time to do this). This is clearly the wrong way to implement it, but ... The same system has ARC and delayed vacuum. With normal, unmodified checkpoints every 300 seconds, the transaction responsetime for new_order still peaks at over 30 seconds (5 is already too much) so the system basically come to a freeze during a checkpoint. Now with this high-frequent sync()ing and checkpointing by the minute, the entire system load levels out really nice. Basically it's constantly checkpointing. So maybe the thing we're looking for is to make the checkpoint process the background buffer writer process and let it checkpoint 'round the clock. Of course, with a bit more selectivity on what to fsync and not doing system wide sync() every 10-500 milliseconds :-) Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] UPPER()/LOWER() and UTF-8
TL == Tom Lane [EMAIL PROTECTED] writes: TL Alexey Mahotkin [EMAIL PROTECTED] writes: I'm running Postgresql 7.3.4 with ru_RU.UTF-8 locale (with UNICODE database encoding), and all is almost well, except that UPPER() and LOWER() seem to ignore locale. TL upper/lower aren't going to work desirably in any multi-byte TL character set encoding. Can you please point me at their implementation? I do not understand why that's impossible. TL I think Peter E. is looking into what TL it would take to fix this for 7.5, but at present you are TL going to need to use a single-byte encoding within the server. TL (Nothing to stop you from using UTF-8 on the client side TL though.) Thanks, --alexm ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] NetBSD/acorn32
I may have missed 1.4: PostgreSQL 7.5devel on arm-unknown-netbsdelf1.6ZE, compiled by GCC gcc (GCC) 3.3.2-nb1 but all tests work there. I didn't manage to compile bison 1.875 under NetBSD-1.6P/acorn32, as gcc 2.95.3 really didn't like it. Upgrading to NetBSD-1.6ZE/acorn32 upgraded to gcc 3.3.2, so let me build the bison. (bison 1.75 was fine for both) This doesn't say anything about postgresql though, and I'm guessing that postgresql itself would be fine with the older NetBSD/acorn32... Cheers, Patrick ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] NetBSD/acorn32
Patrick Welche writes: PostgreSQL 7.5devel on arm-unknown-netbsdelf1.6ZE, compiled by GCC gcc (GCC) 3.3.2-nb1 Can you test 7.4? -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] possible replace() bug - postgres 7.3.1
Ryan Mahoney kirjutas N, 06.11.2003 kell 23:03: I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96 the following query fails: SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') || \'' || replace(data, '\'', '\'\'') || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10; with the error: ERROR: repalloc: invalid request size 1073741824 but, the following query (without replace): SELECT 'UPDATE pa_sales_lead SET details = COALESCE(details, \'\') || \'' || data || '\' WHERE foreign_sales_lead_id = \'' || id || '\';' FROM temp_sales_lead ORDER BY id, row LIMIT 10; works just fine, producing output like: UPDATE pa_sales_lead SET details = COALESCE(details, '') || 'EMPLOY: 50' WHERE foreign_sales_lead_id = '1004'; any ideas? what does just SELECT replace(data, '\'', '\'\'') FROM temp_sales_lead ORDER BY id, row LIMIT 10; produce ? Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] UPPER()/LOWER() and UTF-8
Alexey Mahotkin kirjutas K, 05.11.2003 kell 17:11: Aha, that's in src/backend/utils/adt/formatting.c, right? Yes, I see, it goes byte by byte and uses toupper(). I believe we could look at the locale, and if it is UTF-8, then use (or copy) e.g. g_utf8_strup/strdown, right? http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strup I belive that patch could be written in a matter of hours. TL There has been some discussion of using wctype.h where TL available, but this has a number of issues, notably figuring TL out the correct mapping from the server string encoding (eg TL UTF-8) to unpacked wide characters. At minimum we'd need to TL know which charset the locale setting is expecting, and there TL doesn't seem to be a portable way to find that out. TL IIRC, Peter thinks we must abandon use of libc's locale TL functionality altogether and write our own locale layer before TL we can really have all the locale-specific functionality we TL want. I believe that native Unicode strings (together with human language handling) should be introduced as (almost) separate data type (which have nothing to do with locale), but that's bluesky maybe. They should have nothing to do with _system_ locale, but you can neither UPPER()/LOWER() nor ORDER BY unless you know the locale. It is just that the locale should either be property of column or given in the SQL statement. I guess one could write UCHAR, UVARCHAR, UTEXT types based on ICU. - Hannu ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] BTree index
On Wed, Nov 05, 2003 at 09:08:31AM -0600, [EMAIL PROTECTED] wrote: I am trying to find information regarding creation of B-tree index in postgres for variable length character data (Char/varchar type). Specifically, what pagination policy is used, does it use prefix BTree, or any other form of compression (encoding)? I was very surprised while writing this answer: The whole key is stored (no prefix, no pagination). If the key is too big, it won't fit into the index and the insertion will be rejected: regression=# create table test (a text); CREATE TABLE regression=# create index test_idx on test(a); CREATE INDEX regression=# insert into test values ('hello world'); INSERT 17115 1 regression=# insert into test select repeat(a,10) from test; INSERT 17116 1 regression=# insert into test select repeat(a,10) from test; INSERT 0 2 regression=# insert into test select repeat(a,10) from test; INSERT 0 4 regression=# insert into test select repeat(a,10) from test; INSERT 0 8 regression=# insert into test select repeat(a,10) from test; ERROR: fila de índice requiere 12624 bytes, tamaño máximo es 8191 -- oops regression=# set lc_messages to 'C'; SET regression=# insert into test select repeat(a,10) from test; ERROR: index row requires 12624 bytes, maximum size is 8191 So, what size were the tuples inserted: regression=# select max(length(a)) from test; max 11 (1 fila) What!? 11 bytes? I have always had the idea that the tuples were uncompressed, so how can 11 bytes be stored in 8191 bytes? After tracking into the sourcecode, I found that in src/backend/access/common/indextuple.c the index_formtuple routine seems to compress the key before insertion. In src/include/tuptoaster.h there is a symbol for activation of this feature that is set at least on my sources (TOAST_INDEX_HACK). So, there you are: the compression used is the same lousy fast LZ algorithm used elsewhere in the TOAST code (toast_compress_datum()). -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) FOO MANE PADME HUM ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] OSR5: Passes without the setsockopt() calls...
Can I get a comment from someone on this please... LER --On Saturday, November 08, 2003 18:03:45 -0600 Larry Rosenman [EMAIL PROTECTED] wrote: From my SCO contacts re: the failure on OSR5: I haven't had time and won't have time to fully diagnose this until after i get back from vacation. In the mean time, i am attaching a patch which gets OpenServer 5.0.7 to pass 92 of 93 tests with one test ignored. Basically, this patch just says if you're on OpenServer then don't do the setsockopt calls. Not very satisfactory as setsockopt should work on OSR5. I know this isn't an acceptable patch, and I'm not sure when my contact is back from Vacation. Do we want to hold the release? LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 pgp0.pgp Description: PGP signature
Re: [HACKERS] possible replace() bug - postgres 7.3.1
Hannu Krosing wrote: Ryan Mahoney kirjutas N, 06.11.2003 kell 23:03: I am running PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96 what does just SELECT replace(data, '\'', '\'\'') FROM temp_sales_lead ORDER BY id, row LIMIT 10; produce ? There was a bug in replace(), fixed by Tom in 7.3.3 I think (for sure by 7.3.4). If you have any embedded % characters in data, it would trigger the bug. Try upgrading to 7.3.4. HTH, Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] OSR5: Passes without the setsockopt() calls...
On Sun, 9 Nov 2003, Larry Rosenman wrote: Can I get a comment from someone on this please... no, the release will not be held up LER --On Saturday, November 08, 2003 18:03:45 -0600 Larry Rosenman [EMAIL PROTECTED] wrote: From my SCO contacts re: the failure on OSR5: I haven't had time and won't have time to fully diagnose this until after i get back from vacation. In the mean time, i am attaching a patch which gets OpenServer 5.0.7 to pass 92 of 93 tests with one test ignored. Basically, this patch just says if you're on OpenServer then don't do the setsockopt calls. Not very satisfactory as setsockopt should work on OSR5. I know this isn't an acceptable patch, and I'm not sure when my contact is back from Vacation. Do we want to hold the release? LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Performance features the 4th
Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: However, I have not seen much evidence yet that the vacuum delay alone does that much. Gaetano and a couple of other people did experiments that seemed to show it was useful. I think we'd want to change the shape of the knob per later suggestions (sleep 10 ms every N blocks, instead of N ms every block) but it did seem that there was useful bang for little buck there. Right, I'd like to try know the patch: sleep N ms every M blocks. Can you please post this patch ? BTW, I'll see if I'm able to apply it also to a 7.3.X ( our production DB). Regards Gaetano Mendola ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Performance features the 4th
scott.marlowe wrote: On Fri, 7 Nov 2003, Matthew T. O'Connor wrote: - Original Message - From: Jan Wieck [EMAIL PROTECTED] Tom Lane wrote: Gaetano and a couple of other people did experiments that seemed to show it was useful. I think we'd want to change the shape of the knob per later suggestions (sleep 10 ms every N blocks, instead of N ms every block) but it did seem that there was useful bang for little buck there. I thought it was sleep N ms every M blocks. Have we seen any numbers? Anything at all? Something that gives us a clue by what factor one has to multiply the total time a VACUUM ANALYZE takes, to get what effect in return? I have some time on sunday to do some testing. Is there a patch that I can apply that implements either of the two options? (sleep 10ms every M blocks or sleep N ms every M blocks). I know Tom posted the original patch that sleept N ms every 1 block (where N is 10 due to OS limitations). Jan can you post a patch that has just the sleep code in it? Or should it be easy enough for me to cull out of the larger patch you posted? The reason for the change is that the minumum sleep period on many systems is 10mS, which meant that vacuum was running 20X slower than normal. While it might be necessary in certain very I/O starved situations to make it this slow, it would probably be better to be able to get a vacuum that ran at about 1/2 to 1/5 speed for most folks. So, since the delta can't less than 10mS on most systems, it's better to just leave it at a fixed amount and change the number of pages vacuumed per sleep. I disagree with that. If you limit yourself to the number of pages being the only knob you have and set the napping time fixed, you can only lower the number of sequentially read pages to slow it down. Making read ahead absurd in an IO starved situation ... I'll post a patch doing every N pages nap for M milliseconds using two GUC variables and based on a select(2) call later. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Performance features the 4th
Matthew T. O'Connor wrote: - Original Message - From: Jan Wieck [EMAIL PROTECTED] Tom Lane wrote: Gaetano and a couple of other people did experiments that seemed to show it was useful. I think we'd want to change the shape of the knob per later suggestions (sleep 10 ms every N blocks, instead of N ms every block) but it did seem that there was useful bang for little buck there. I thought it was sleep N ms every M blocks. Have we seen any numbers? Anything at all? Something that gives us a clue by what factor one has to multiply the total time a VACUUM ANALYZE takes, to get what effect in return? I have some time on sunday to do some testing. Is there a patch that I can apply that implements either of the two options? (sleep 10ms every M blocks or sleep N ms every M blocks). I know Tom posted the original patch that sleept N ms every 1 block (where N is 10 due to OS limitations). Jan can you post a patch that has just the sleep code in it? Or should it be easy enough for me to cull out of the larger patch you posted? Sorry for the delay, had to finish some other concept yesterday (will be published soon). The attached patch adds vacuum_group_delay_size = 10 (range 1-1000) vacuum_group_delay_msec = 0 (range 0-1000) and does the sleeping via select(2). It does it only at the same places where Tom had done the usleep() in his hack, so I guess there is still some more to do besides the documentation, before it can be added to 7.4.1. But it should be enough to get some testing done. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # Index: src/backend/access/nbtree/nbtree.c === RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v retrieving revision 1.106 diff -c -b -r1.106 nbtree.c *** src/backend/access/nbtree/nbtree.c 2003/09/29 23:40:26 1.106 --- src/backend/access/nbtree/nbtree.c 2003/11/09 23:39:36 *** *** 27,32 --- 27,40 #include storage/smgr.h + /* + * Variables for vacuum_group_delay option (in commands/vacuumlazy.c) + */ + extern intvacuum_group_delay_size;/* vacuum N pages */ + extern intvacuum_group_delay_msec;/* then sleep M msec */ + extern intvacuum_group_delay_count; + + /* Working state for btbuild and its callback */ typedef struct { *** *** 610,615 --- 618,632 CHECK_FOR_INTERRUPTS(); + if (vacuum_group_delay_msec 0) + { + if (++vacuum_group_delay_count = vacuum_group_delay_size) + { + PG_DELAY(vacuum_group_delay_msec); + vacuum_group_delay_count = 0; + } + } + ndeletable = 0; page = BufferGetPage(buf); opaque = (BTPageOpaque) PageGetSpecialPointer(page); *** *** 736,741 --- 753,769 Buffer buf; Pagepage; BTPageOpaque opaque; + + CHECK_FOR_INTERRUPTS(); + + if (vacuum_group_delay_msec 0) + { + if (++vacuum_group_delay_count = vacuum_group_delay_size) + { + PG_DELAY(vacuum_group_delay_msec); + vacuum_group_delay_count = 0; + } + } buf = _bt_getbuf(rel, blkno, BT_READ); page = BufferGetPage(buf); Index: src/backend/commands/vacuumlazy.c === RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v retrieving revision 1.32 diff -c -b -r1.32 vacuumlazy.c *** src/backend/commands/vacuumlazy.c 2003/09/25 06:57:59 1.32 --- src/backend/commands/vacuumlazy.c 2003/11/09 23:40:13 *** *** 88,93 --- 88,100 static TransactionId OldestXmin; static TransactionId FreezeLimit; + /* + * Variables for vacuum_group_delay option (in commands/vacuumlazy.c) + */ + int vacuum_group_delay_size = 10; /* vacuum N pages */ + int vacuum_group_delay_msec = 0;/* then sleep M msec */ + int vacuum_group_delay_count = 0; + /* non-export function prototypes */ static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, *** *** 228,233 --- 235,249 CHECK_FOR_INTERRUPTS(); + if (vacuum_group_delay_msec 0) + { +
Re: [HACKERS] Coding help
| COMMENT ON LARGE OBJECT NumericOnly IS comment_text n-objname = makeList1(makeAConst($5)); Forget the makeAConst step --- it's just wasted cycles. In the execution routine, you can use code comparable to define.c's defGetInt64() to convert the Value node into a numeric OID, ie, either do intVal() or a call of oidin(). I thought the whole problem here is that OIDs are unsigned ints, hence intVal() won't allow the highest OIDs? If that's not the case, then the easy solution is to just use an IConst or whatever it's called. (Which I had working before). Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Coding help
I thought the whole problem here is that OIDs are unsigned ints, hence intVal() won't allow the highest OIDs? Exactly. That's why you need to handle T_Float too. See the int8 example, which has just the same problem. It occurs to me then that I could just then use FloatOnly in the grammar and then just attempt to coerce that to an Oid using oidin? Floats are stored as strings, so that'd be OK, and the oidin function would barf if it was an illegal uint? Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Coding help
Christopher Kings-Lynne [EMAIL PROTECTED] writes: It occurs to me then that I could just then use FloatOnly in the grammar I don't think so, unless you are proposing something very odd in the lexer to force it to treat an integer literal as a float in this one context. What is the problem with following the lead of the existing code? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Setting client encoding with jdbc
On 7 Nov 2003, Brad wrote: I'm interested in changing the client encoding from within a java program. I'm using connection pooling and need to change it depending on the client's language preference. Is there a function I can call which changes the client encoding. Something like pStmt = con.prepareStatement(select pg_set_client_encoding('LATIN5'); ); rs = pStmt.executeQuery(); I can set the client encoding from within psql by /encoding LATIN5 but not sure how to execute something like this from java. Any help would be greatly appreciated. Please direct JDBC questions to the pgsql-jdbc list. You may not set the client encoding from JDBC. The JDBC driver requires a unicode client encoding to work properly. You may then reencode the data in your charset of preference using the standard java APIs. Kris Jurka ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: You want to find, open, and fsync() every file in the database cluster for every checkpoint? Sounds like a non-starter to me. Except a) this is outside any critical path, and b) only done every few minutes and c) the fsync calls on files with no dirty buffers ought to be cheap, at least as far as i/o. The directory search and opening of the files is in itself nontrivial overhead ... particularly on systems where open(2) isn't speedy, such as Solaris. I also disbelieve your assumption that fsync'ing a file that doesn't need it will be free. That depends entirely on what sort of indexes the OS keeps on its buffer cache. There are Unixen where fsync requires a scan through the entire buffer cache because there is no data structure that permits finding associated buffers any more efficiently than that. (IIRC, the HPUX system I'm typing this on is like that.) On those sorts of systems, we'd be way better off to use O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs. Check the archives --- this was all gone into in great detail when we were testing alternative methods for fsyncing the WAL files. Not sure on this one --- let's look at our options O_SYNC fsync sync Now, O_SYNC is going to force every write to the disk. If we have a transaction that has to write lots of buffers (has to write them to reuse the shared buffer), it will have to wait for every buffer to hit disk before the write returns --- this seems terrible to me and gives the drive no way to group adjacent writes. Even on HPUX, which has poor fsync dirty buffer detection, if the fsync is outside the main processing loop (checkpoint process), isn't fsync better than O_SYNC? Now, if we are sure that writes will happen only in the checkpoint process, O_SYNC would be OK, I guess, but will we ever be sure of that? I can't imagine a checkpoint process keeping up with lots of active backends, especially if the writes use O_SYNC. The problem is that instead of having backend write everything to kernel buffers, we are all of a sudden forcing all writes of dirty buffers to disk. sync() starts to look very attractive compared to that option. fsync is better in that we can force it after a number of writes, and can delay it, so we can write a buffer and reuse it, then later issue the fsync. That is a win, though it doesn't allow the drive to group adjacent writes in different files. Sync of course allows grouping of all writes by the drive, but writes all non-PostgreSQL dirty buffers too. Ideally, we would have an fsync() where we could pass it a list of our files and it would do all of them optimally. From what I have heard so far, sync() still seems like the most efficient method. I know it only schedules write, but with a sleep after it, it seems like maybe the best bet. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: What still needs to be addressed is the IO storm cause by checkpoints. I see it much relaxed when stretching out the BufferSync() over most of the time until the next one should occur. But the kernel sync at it's end still pushes the system hard against the wall. I have never been happy with the fact that we use sync(2) at all. Quite aside from the I/O storm issue, sync() is really an unsafe way to do a checkpoint, because there is no way to be certain when it is done. And on top of that, it does too much, because it forces syncing of files unrelated to Postgres. I would like to see us go over to fsync, or some other technique that gives more certainty about when the write has occurred. There might be some scope that way to allow stretching out the I/O, too. The main problem with this is knowing which files need to be fsync'd. The only idea I have come up with is to move all buffer write operations into a background writer process, which could easily keep track of every file it's written into since the last checkpoint. This could cause problems though if a backend wants to acquire a free buffer and there's none to be had --- do we want it to wait for the background process to do something? We could possibly say that backends may write dirty buffers for themselves, but only if they fsync them immediately. As long as this path is seldom taken, the extra fsyncs shouldn't be a big performance problem. Actually, once you build it this way, you could make all writes synchronous (open the files O_SYNC) so that there is never any need for explicit fsync at checkpoint time. The background writer process would be the one incurring the wait in most cases, and that's just fine. In this way you could directly control the rate at which writes are issued, and there's no I/O storm at all. (fsync could still cause an I/O storm if there's lots of pending writes in a single file.) This outlines the same issue --- a very active backend might dirty 5k buffers --- if those 5k buffers have to be written using O_SYNC, it will take much longer than doing 5k buffer writes and doing an fsync() or sync() at the end. Having another process do the writing does allow some paralellism, but people don't seem to care of buffers having to be read in from the kernel buffer cache, so what big benefit do we get by having someone else write into the kernel buffer cache, except allowing a central place to fsync, and is it worth it considering that it might be impossible to configure a system where the writer process can keep up with all the backends? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
scott.marlowe wrote: On Tue, 4 Nov 2003, Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: What still needs to be addressed is the IO storm cause by checkpoints. I see it much relaxed when stretching out the BufferSync() over most of the time until the next one should occur. But the kernel sync at it's end still pushes the system hard against the wall. I have never been happy with the fact that we use sync(2) at all. Quite aside from the I/O storm issue, sync() is really an unsafe way to do a checkpoint, because there is no way to be certain when it is done. And on top of that, it does too much, because it forces syncing of files unrelated to Postgres. I would like to see us go over to fsync, or some other technique that gives more certainty about when the write has occurred. There might be some scope that way to allow stretching out the I/O, too. The main problem with this is knowing which files need to be fsync'd. Wasn't this a problem that the win32 port had to solve by keeping a list of all files that need fsyncing since Windows doesn't do sync() in the classical sense? If so, then could we use that code to keep track of the files that need fsyncing? Yes, I have that code from SRA. They used threading, so they recorded all the open files in local memory and opened/fsync/closed them for checkpoints. We have to store the file names in a shared area, perhaps an area of shared memory with an overflow to a disk file. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
I would be interested to know if you have the background write process writing old dirty buffers to kernel buffers continually if the sync() load is diminished. What this does is to push more dirty buffers into the kernel cache in hopes the OS will write those buffers on its own before the checkpoint does its write/sync work. This might allow us to reduce sync() load while preventing the need for O_SYNC/fsync(). Perhaps sync() is bad partly because the checkpoint runs through all the dirty shared buffers and writes them all to the kernel and then issues sync() almost guaranteeing a flood of writes to the disk. This method would find fewer dirty buffers in the shared buffer cache, and therefore fewer kernel writes needed by sync(). --- Jan Wieck wrote: Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: How I can see the background writer operating is that he's keeping the buffers in the order of the LRU chain(s) clean, because those are the buffers that most likely get replaced soon. In my experimental ARC code it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and n2 dirty buffers (n1+n2 configurable), then fsync all files that have been involved in that, nap depending on where he got down the queues (to increase the write rate when running low on clean buffers), and do it all over again. You probably need one more knob here: how often to issue the fsyncs. I'm not convinced once per outer loop is a sufficient answer. Otherwise this is sounding pretty good. This is definitely heading into the right direction. I currently have a crude and ugly hacked system, that does checkpoints every minute but streches them out over the whole time. It writes out the dirty buffers in T1+T2 LRU order intermixed, streches out the flush over the whole checkpoint interval and does sync()+usleep() every 32 blocks (if it has time to do this). This is clearly the wrong way to implement it, but ... The same system has ARC and delayed vacuum. With normal, unmodified checkpoints every 300 seconds, the transaction responsetime for new_order still peaks at over 30 seconds (5 is already too much) so the system basically come to a freeze during a checkpoint. Now with this high-frequent sync()ing and checkpointing by the minute, the entire system load levels out really nice. Basically it's constantly checkpointing. So maybe the thing we're looking for is to make the checkpoint process the background buffer writer process and let it checkpoint 'round the clock. Of course, with a bit more selectivity on what to fsync and not doing system wide sync() every 10-500 milliseconds :-) Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: That is part of the idea. The whole idea is to issue physical writes at a fairly steady rate without increasing the number of them substantial or interfering with the drives opinion about their order too much. I think O_SYNC for random access can be in conflict with write reordering. Good point. But if we issue lots of writes without fsync then we still have the problem of a write storm when the fsync finally occurs, while if we fsync too often then we constrain the write order too much. There will need to be some tuning here. I know the BSD's have trickle sync --- if we write the dirty buffers to kernel buffers many seconds before our checkpoint, the kernel might right them to disk for use and sync() will not need to do it. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Tom Lane wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Sun, Nov 02, 2003 at 01:00:35PM -0500, Tom Lane wrote: real traction we'd have to go back to the take over most of RAM for shared buffers approach, which we already know to have a bunch of severe disadvantages. I know there are severe disadvantages in the current implementation, but are there in-principle severe disadvantages? Yes. For one, since we cannot change the size of shared memory on-the-fly (at least not portably), there is no opportunity to trade off memory usage dynamically between processes and disk buffers. For another, on many systems shared memory is subject to being swapped out. Swapping out dirty buffers is a performance killer, because they must be swapped back in again before they can be written to where they should have gone. The only way to avoid this is to keep the number of shared buffers small enough that they all remain fairly hot (recently used) and so the kernel won't be tempted to swap out any part of the region. Agreed, we can't resize shared memory, but I don't think most OS's swap out shared memory, and even if they do, they usually have a kernel configuration parameter to lock it into kernel memory. All the old unixes locked the shared memory into kernel address space and in fact this is why many of them required a kernel recompile to increase shared memory. I hope the ones that have pagable shared memory have a way to prevent it --- at least FreeBSD does, not sure about Linux. Now, the disadvantages of large kernel cache, small PostgreSQL buffer cache is that data has to be transfered to/from the kernel buffers, and second, we can't control the kernel's cache replacement strategy, and will probably not be able to in the near future, while we do control our own buffer cache replacement strategy. Looking at the advantages/disadvantages, a large shared buffer cache looks pretty good to me. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Bruce Momjian wrote: Having another process do the writing does allow some paralellism, but people don't seem to care of buffers having to be read in from the kernel buffer cache, so what big benefit do we get by having someone else write into the kernel buffer cache, except allowing a central place to fsync, and is it worth it considering that it might be impossible to configure a system where the writer process can keep up with all the backends? This might be far fetched, but I wonder if having a writer process opens up the possibility of running PostgreSQL in a cluster? I'm thinking of two servers, mounted to the same data volume, and some kind of coordination between the writer processes. Anyone know if this is similar to how Oracle handles RAC? Joe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Experimental patch for inter-page delay in VACUUM
Bruce Momjian wrote: Agreed, we can't resize shared memory, but I don't think most OS's swap out shared memory, and even if they do, they usually have a kernel configuration parameter to lock it into kernel memory. All the old unixes locked the shared memory into kernel address space and in fact this is why many of them required a kernel recompile to increase shared memory. I hope the ones that have pagable shared memory have a way to prevent it --- at least FreeBSD does, not sure about Linux. I'm pretty sure at least Linux, Solaris, and HPUX all work this way -- otherwise Oracle would have the same problem with their SGA, which is kept in shared memory. Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] [ADMIN] [HACKERS] retrieve statement from catalogs
Can you just use the SQL logging and read that file afterwords? /djb - Original Message - From: Jaime Casanova [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, November 07, 2003 14:59 Subject: Re: [GENERAL] [ADMIN] [HACKERS] retrieve statement from catalogs Hi everybody, can anyone tell me if there's a way to retrieve the select instruction executed from the catalogs, or maybe via some structure in a trigger? The reason is that i have some selects constructed on-the-fly (just part of it) and i want to save that in a table in order to know what are the most used for optimizing them. Another reason for doing that is that i don't know all the selects in the program nor the page and this is an easier solution than examine all the code. thanx in advance, el_vigia _ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])