Re: [HACKERS] stats_command_string default?
I wrote: > > Would it make more sense to enable stats_command_string by default? > > It could be a problem if doing so would have a significant impact on > > performance, but that's the only reason I can think of for not doing > > it. Are there others? and Christopher Kings-Lynne responded: > You can just turn it on in one second in your postgresql.conf > file... Sure. But my initial perusal of the documentation didn't reveal that option (once I knew the name of the option to enable, I knew better where to look in the documentation, but by that time it was too late). I ended up figuring it out by (a) hearing that it was possible, (b) looking at the definition of pg_stat_activity, and (c) looking through the source code to find out how the pg_stat_get_backend_activity() function worked, and from there how to make it display something. Since we've been talking about fixing the defaults, it seems to me a good opportunity to address little things like this as well. It doesn't cost us anything and it'll make someone's life easier. It seems to me that it makes the most sense for the defaults to be whatever is the most useful for the most people. I can see lots of use in enabling stats_command_string by default and little use in disabling it by default. It seems to me that most DBAs will want it turned on unless there's a big performance loss as a result. That's why I asked the question: is there a really good reason that most DBAs would want it disabled? My sense is that there isn't, but I don't know, which is why I'm asking. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] stats_command_string default?
Tom Lane wrote: > Kevin Brown <[EMAIL PROTECTED]> writes: > > Would it make more sense to enable stats_command_string by default? > > I'd vote against it. If we turn it on by default, people are paying > for a feature they may not even know exists. Once they find out about > it and decide they want it, they can turn it on easily enough. > > If you can show that the overhead is unmeasurable, that'd indicate that > this argument is bogus; but I suspect it's not negligible, at least on > simple queries. It's not unmeasurable, but it is reasonably low (guess it depends on your definition of "reasonable" :-). I wrote a small perl script which would do a "SELECT 1" in a loop as many times as I specified on the command line (autocommit was turned off). I measured the amount of wall clock time it took to do 10 passes on an unloaded system with stats_command_string enabled, and then the same thing with it disabled. The difference in time over 10 passes was 20 seconds (44 seconds with stats_command_string turned on, 24 with it turned off), for an impact of 0.2 milliseconds per command executed. This was on a 1.5GHz P4 with 1G of RAM running Linux 2.4.20 on ReiserFS. The data is stored on a software RAID-5 across 3 Seagate ST-380021A IDE drives, each connected to a separate channel on a Promise ATA100 card. I have no idea if that's small enough to be considered negligible or not, considering the hardware it was running on. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] online reindex
It sounds to me like it should be fairly straightforward to implement online reindex. That is, reindex without locking the table. This is an important feature for 24x7 operation. Since postgres doesn't modify data in place the only thing required for online reindex is to ensure that the reindex operation sees all of the latest data. If reindex sets a flag that causes all new inserts and updates to allocate new space at the end of the heap without checking for free space, then a simple linear scan should be guaranteed to catch all the data. (I'm not sure how the indexing operation works, if it reads in the whole table and then sorts it there would have to be an extra step where any new tuples are read in and inserted.) There would only have to be a small window with the table locked while the indexes are swapped at the end. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Linux.conf.au 2003 Report
On Sat, 15 Feb 2003, Christopher Kings-Lynne wrote: > I think so - Gavin? As far as I'm aware there's not really anything else > on the open source circuit. There is often a MySQL rep there as well > apparently. Chris is right. David Axmark (MySQL AB) usually turns up, but he didn't this year. The conference has an attendence of 400 people. The audience is fairly technical -- getting less so each year though :-(. If I go next year, I think I will give a tutorial focusing on migrating MySQL applications to PostgreSQL. Many attendees of my talk this year were looking for information like that -- something I didn't cover :-\. Gavin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] client_encoding directive is ignored in
> > Actually the problem can be divided into two parts: > > 1) backend does not process GUC client_encoding. > > 2) libpq does not ask the backend's client_encoding, instead it asks > >datanbase encoding when it starts up the connection. This is just a > >mistake. > > > I think we could fix 1) without any backward compatibilty problem and > > should be applied to both 7.3-STATBLE and current. > > If we change the backend behavior without changing libpq, aren't we > breaking things even worse? As long as libpq behaves as in (2), hadn't > the backend better init its idea of client_encoding to match > database_encoding? Why? No matter how the backend's behavior regarding client_encoding changes, libpq won't be affected by it since 7.2 and 7.3 libpq does not use client_encoding anyway. -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] location of the configuration files
Tom Lane writes: > AFAICS, you can either set -C to /etc if you want your PG config files > loose in /etc, or you can set it to /etc/postgresql/ if you want them > in a privately-owned directory. Which other arrangements are needed? People might want to share them between servers, or allow a user to select from a few pre-configured ones that which reside in the same directory. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] psql and readline
On Sat, Feb 15, 2003 at 03:10:19PM -0600, Ross J. Reedstrom wrote: > On Fri, Feb 14, 2003 at 11:32:02AM -0500, Tom Lane wrote: > > Patrick Welche <[EMAIL PROTECTED]> writes: > > > On Thu, Feb 13, 2003 at 10:25:52AM -0500, Tom Lane wrote: > > >> Well, is that a bug in your wrapper? Or must we add a configure test > > >> for the presence of replace_history_entry()? > > > > > Good question. Easiest for now for me would be add a configure test. > > > > Okay with me --- Ross, can you handle that? > > I'll take a crack at it. Testing will be a problem, since all my machines > have libreadline installed, but I've got one I can probably pull it off > of. Patrick, is 'your libedit wrapper' your personal code, or something > standard on some set of systems (e.g. *BSD) ? It's the standard NetBSD libedit which comes with built-in readline emulation, it's just that replace_history_entry isn't one of the emulated functions. ({read,write,add}_history() are included) > Ross "one line patches always grow" Reedstrom :-) I'm of course very happy to test the growing patch! Cheers, Patrick ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] client_encoding directive is ignored in
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Why? No matter how the backend's behavior regarding client_encoding > changes, libpq won't be affected by it since 7.2 and 7.3 libpq does > not use client_encoding anyway. Doesn't client_encoding determine what encoding the backend sends to the client? It's probable that libpq itself is not affected by the selected client_encoding, since it only passes data through. But I think that applications are quite likely to be broken if we alter the backend's behavior in a way that changes the default client encoding. That strikes me as a change that should be made at a major release, not a minor one --- people don't expect to get hit by compatibility problems when they do a minor upgrade. But this argument is mostly irrelevant if the proposed change will not affect behavior in a default installation. I guess I'm not entirely clear on exactly which cases it will affect. What will your proposed change do in each possible combination (database encoding is SQL_ASCII or not, client_encoding is defined in postgresql.conf or not, PGCLIENTENCODING is set in postmaster's environment or not, etc)? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] location of the configuration files
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Tom Lane writes: >> AFAICS, you can either set -C to /etc if you want your PG config files >> loose in /etc, or you can set it to /etc/postgresql/ if you want them >> in a privately-owned directory. Which other arrangements are needed? > People might want to share them between servers, or allow a user to select > from a few pre-configured ones that which reside in the same directory. You can accomplish that without the need to customize the .conf file names; you just make, eg, /etc/postgres/myconfig/postgresql.conf /etc/postgres/yourconfig/postgresql.conf /etc/postgres/herconfig/postgresql.conf (plus additional config files as needed in each of these directories) and then the postmaster start command is postmaster -C /etc/postgres/myconfig I see no real gain in flexibility in allowing people to choose random names for the individual config files. Also, it'd defeat the ultimate-fallback approach of doing "find / -name postgresql.conf" to figure out where the config files are hiding in an unfamiliar installation. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] location of the configuration files
Lamar Owen <[EMAIL PROTECTED]> writes: > The script's been out there for awhile. It does some things well, and some > things not so well. The config files are still coresident with the database, > and backup is more difficult than it can be. Meeting all these needs (with > configure switches, configuration file directives, etc) would be a good > thing. Sure. I'm happy to change the software in a way that *allows* moving the config files elsewhere. But it's not apparent to me why you insist on forcing people who are perfectly happy with their existing configuration arrangements to change them. I have not seen any reason in this discussion why we can't support both a separate-config-location approach and the traditional single-location one. Please remember that the existing approach has been evolved over quite a few releases. It may not satisfy the dictates of the FHS religion, but it does meet some people's needs perfectly well. Let's look for a solution that permits coexistence, rather than one that forces change on people who don't need or want change. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] location of the configuration files
Kevin Brown <[EMAIL PROTECTED]> writes: > So in your case, what's the advantage of having initdb write anything > to a config file, when you're probably also relying on PGDATA or -D to > start the database (if you're not, then fair enough. But see below)? Keep in mind that initdb doesn't currently *need* to put the datadir location into the config file. It *will* need to do so if we separate config and data dirs. Or at least, *somebody* will need to do so. It's not apparent to me how it simplifies life not to have initdb do it. Especially when there are other configuration items that initdb should or already does record: locale settings, database encoding. And we have already been talking about improving PG's self-tuning capability. initdb would be the natural place to look around for information like available RAM and adjust the config-file settings like sort_mem accordingly. Basically, the notion that initdb shouldn't write a config file seems like a complete dead end to me. It cannot possibly be more convenient than the alternatives. We'd be giving up a lot of current and future functionality --- and for what? > I'd expect initdb to initialize a database. If I were running initdb > without a lot of foreknowledge of its side effects, I think I'd > probably be a bit surprised to find that it had touched my config > file. If we do it the way I suggested (dump into the datadir, which is initially empty, same as always) then it cannot overwrite your existing config files. Think of it as providing a suggested config file to compare against what you have. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] stats_command_string default?
Kevin Brown <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Kevin Brown <[EMAIL PROTECTED]> writes: >>> Would it make more sense to enable stats_command_string by default? >> >> I'd vote against it. If we turn it on by default, people are paying >> for a feature they may not even know exists. Once they find out about >> it and decide they want it, they can turn it on easily enough. > The difference in time over 10 passes was 20 seconds (44 seconds > with stats_command_string turned on, 24 with it turned off), for an > impact of 0.2 milliseconds per command executed. In other words, more than an eighty percent penalty on simple commands. Not negligible in my book. > I have no idea if that's small enough to be considered negligible or > not, considering the hardware it was running on. I would imagine that the CPU-time ratio would not depend all that much on the particular hardware. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] online reindex
Greg Stark <[EMAIL PROTECTED]> writes: > If reindex sets a flag that causes all new inserts and updates to allocate new > space at the end of the heap without checking for free space, then a simple > linear scan should be guaranteed to catch all the data. Oh? If people are inserting tuples at the same time you're reading, I'm not sure this holds good at all. > There would only have to be a small window with the table locked while the > indexes are swapped at the end. Can you say "deadlock"? Upgrading a lock from shared to exclusive won't work in general, because there may be other transactions trying to do the same thing. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] stats_command_string default?
Kevin Brown <[EMAIL PROTECTED]> writes: > It's not unmeasurable, but it is reasonably low (guess it depends on > your definition of "reasonable" :-). I wrote a small perl script > which would do a "SELECT 1" in a loop as many times as I specified on > the command line (autocommit was turned off). I measured the amount > of wall clock time it took to do 10 passes on an unloaded system > with stats_command_string enabled, and then the same thing with it > disabled. FWIW, I did a comparable test using a slightly more reasonable example (ie, a query that does useful work): in the regression database, PREPARE q(int) AS SELECT * FROM tenk1 WHERE unique1 = $1 followed by 1 executions of EXECUTE q(42) This was with autocommit on (ie, each EXECUTE is its own transaction) and using a C-coded client (a small tweak of src/test/examples/testlibpq.c). Averaging over three trials on an unloaded system, I got 21.0 seconds with stats_command_string off, 27.7 with it on, or about 32% overhead. My conclusion is that stats_command_string overhead is non-negligible for simple commands. So I stand by my previous opinion that it should not be turned on without the DBA taking explicit action to turn it on. Do you want it on in every future benchmark, for example? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] psql and readline
Ross J. Reedstrom writes: > O.K., I found the 'editline' wrapper around 'libedit' that provides > a subset of readline functionality, and used that for testing. On my > Debian Linux systems, editline installs readline compatability headers > (readline.h, history.h) into /usr/include/editline/, so I added tests > for those into configure.in, and src/include/pg_config.h.in, and usage > in src/bin/psql/input.h I don't think this is what we were out for. We've certainly been running with libedit for a long time without anyone ever mentioning /usr/include/editline. I suggest this part is taken out. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] online reindex
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > If reindex sets a flag that causes all new inserts and updates to allocate new > > space at the end of the heap without checking for free space, then a simple > > linear scan should be guaranteed to catch all the data. > > Oh? If people are inserting tuples at the same time you're reading, I'm > not sure this holds good at all. The premise is that reusing space within the heap is disabled during this scan, so any new tuples are at the end. When you've reached the end you've read all the ones inserted during your scan as well. There is a window between the end of the scan and locking the table when more extra tuples could be added but presumably that window would be small and the extra tuples would be limited. They would have to be processed while the table is locked. I don't know much about postgres's locking yet, so I can't comment on how to arrange to be able to get a lock on the table. At first blush this sounds like there needs to be a "upgradable lock" that only one process can hold but allows other processes to read while it's held until it's upgraded. But as I said I don't know much details about the locking policies in use currently. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] stats_command_string default?
> Averaging over three trials on an unloaded system, I got 21.0 seconds > with stats_command_string off, 27.7 with it on, or about 32% overhead. > > My conclusion is that stats_command_string overhead is non-negligible > for simple commands. So I stand by my previous opinion that it should > not be turned on without the DBA taking explicit action to turn it on. > Do you want it on in every future benchmark, for example? How about with the stats_collector on? ie. Recording block and row level stats? Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] Hard problem with concurrency
OK, this is the problem: I want to write a bit of SQL that if a row exists in a table, then update it, otherwise it will update it. Problem is, there is a very high chance of simultaneous execute of this query on the same row (the rows have a unique index). So, strategy one: begin; update row; if (no rows affected) insert row; commit; Problem - race condition! If the two transactions run at the same time, the second will end up doing an insert on a unique row which will cause query failure Strategy two: begin; select row for update; if (row returned) update; else insert; commit; Problem - race condition. The row-level locking doesn't allow me to lock 'potential rows', so if the row does not yet exists and two transactions run simultaneously then the second with die with a unique violation; Strategy three: begin; lock table in exclusive mode; update row; if (no rows affected) insert row; commit; Problem - Works, but this table needs high concurrency. Every time a member hits a page of the site that needs authentication, this function is called. In particular, the login transaction can take a little time sometimes and we can't halt everyone else's activites for that duration... So what is the solution??? I'm not sure if acquiring a ROW EXCLUSIVE MODE lock will help at all. Also, I can't try the insert and then the update because the INSERT, in Postgres, will cause an outright transaction failure. What the heck is the solution?? Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] stats_command_string default?
"Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: >> My conclusion is that stats_command_string overhead is non-negligible >> for simple commands. So I stand by my previous opinion that it should >> not be turned on without the DBA taking explicit action to turn it on. > How about with the stats_collector on? ie. Recording block and row level > stats? Didn't measure that, but I believe the block/row stats are dumped to the collector once per transaction, so the overhead ought to be roughly comparable to this test. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Hard problem with concurrency
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Solution one: use sequences for the unique value. Solution two: use another table to effect the exclusive locking and use it to store the "unique" values: begin; update row; if (no rows affected) { lock table foo in exclusive mode; find a unique value that is not already in foo store this value inside of foo insert row; } commit; Solution three: use your strategy two, but throw a loop around it and have it try again (with a new value) if it gets a unique violation. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200302162143 -BEGIN PGP SIGNATURE- Comment: http://www.turnstep.com/pgp.html iD8DBQE+T0sFvJuQZxSWSsgRAvMbAJwNndfcRb8U+W4TCeSGMGg+j7CqMwCgpfbd 98bDZI1r5AOLv1iCyVTC/AI= =0Nkm -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hard problem with concurrency
"Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: > I can't try the insert and then the update because the INSERT, in Postgres, > will cause an outright transaction failure. Do the update, then try to insert if the update found nothing, and put a retry loop around the whole transaction in case you fail because of concurrent inserts. Realistically you will need a retry loop in all but the most trivial cases anyway --- certainly so if you want to use serializable transaction mode. So I don't think this solution is unworkably complex. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] location of the configuration files
On Sunday 16 February 2003 13:15, Tom Lane wrote: > Sure. I'm happy to change the software in a way that *allows* moving the > config files elsewhere. So we agree. Perfect. > But it's not apparent to me why you insist on > forcing people who are perfectly happy with their existing configuration > arrangements to change them. Me? Trying to force things to change? You misunderstand me. No, I'm trying to understand the rationale for a (relative to the way other designed-multiple daemons do things) different, non-standard configuration process. I understand better now; the exercise was a success. Many thanks. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] location of the configuration files
Tom Lane wrote: Peter Eisentraut <[EMAIL PROTECTED]> writes: Tom Lane writes: I would favor a setup that allows a -C *directory* (not file) to be specified as a postmaster parameter separately from the -D directory; A directory is not going to satisfy people. Why not? Who won't it satisfy, and what's their objection? AFAICS, you can either set -C to /etc if you want your PG config files loose in /etc, or you can set it to /etc/postgresql/ if you want them in a privately-owned directory. Which other arrangements are needed? The idea of using a "directory" puts us back to using symlinks to share files. While I know the core development teams thinks that symlinks are a viable configuration option, most admins, myself included, do not like to use symlinks because they do not have the ability to carry documentation, i.e. comments in a configuration file, and are DANGEROUS in a production environment. Any configuration strategy that depends on symlinks is inadequate and poorly designed. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] location of the configuration files
mlw <[EMAIL PROTECTED]> writes: > The idea of using a "directory" puts us back to using symlinks to share > files. So? If you want to share files, you're probably sharing all three config files and don't need a separate directory at all. This is not a sufficient argument to make me buy into the mess of letting people choose nonstandard configuration file names --- especially when most of the opposite camp seems to be more interested in choosing *standard* names for things. Why does that policy stop short at the directory name? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Hard problem with concurrency
> Do the update, then try to insert if the update found nothing, and put > a retry loop around the whole transaction in case you fail because of > concurrent inserts. > > Realistically you will need a retry loop in all but the most trivial > cases anyway --- certainly so if you want to use serializable > transaction mode. So I don't think this solution is unworkably complex. I guess that will work, but it will not prevent our log from being spammed up with error messages. Also, this is a somewhat simplified case. Some transactions, such as our login transaction have rather large numbers of operations in them and we don't want to have to rollback the whole thing. I guess we'll have to live with it. REPLACE INTO anyone? ;) Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] location of the configuration files
Tom Lane wrote: > Keep in mind that initdb doesn't currently *need* to put the datadir > location into the config file. It *will* need to do so if we separate > config and data dirs. Or at least, *somebody* will need to do so. > It's not apparent to me how it simplifies life not to have initdb do it. > Especially when there are other configuration items that initdb should > or already does record: locale settings, database encoding. Is it possible for the database engine to properly deal with a database when it is told to use a different database encoding than the one the database was initdb'd with? If it's not, then that suggests to me that the database encoding is something that doesn't belong in the configuration file but rather in some other place that is intimately tied with the database itself and which is difficult/impossible to change, like perhaps a read-only system table that gets created at initdb time. > And we have already been talking about improving PG's self-tuning > capability. initdb would be the natural place to look around for > information like available RAM and adjust the config-file settings > like sort_mem accordingly. I agree here, and since you're thinking of just putting the resulting config file in the database data directory, then as a DBA I wouldn't be terribly surprised by it ... especially if it came back with a message that told me what it had done. > If we do it the way I suggested (dump into the datadir, which is > initially empty, same as always) then it cannot overwrite your existing > config files. Think of it as providing a suggested config file to > compare against what you have. There is one minor complication: what if there's an existing config file in the target directory? One use for initdb would be as a quick way to completely wipe the database and start over (e.g., if the encoding were found to be incorrect), but the config file that's already there could easily contain a lot of customization that the administrator would want to retain. Which suggests that we should consider writing to a file using a slightly different name (e.g., postgresql.conf.initdb), at least in the event that a config file already exists in the target directory. Not sure what the overall right thing to do here is... -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] stats_command_string default?
Tom Lane wrote: > Kevin Brown <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Kevin Brown <[EMAIL PROTECTED]> writes: > >>> Would it make more sense to enable stats_command_string by default? > >> > >> I'd vote against it. If we turn it on by default, people are paying > >> for a feature they may not even know exists. Once they find out about > >> it and decide they want it, they can turn it on easily enough. > > > > The difference in time over 10 passes was 20 seconds (44 seconds > > with stats_command_string turned on, 24 with it turned off), for an > > impact of 0.2 milliseconds per command executed. > > In other words, more than an eighty percent penalty on simple commands. > Not negligible in my book. Guess that's true when looked at that way. :-) Then again, that's the worst possible case: a simple command that only invokes the parser and executor, doesn't reference any tables, doesn't call any functions, and doesn't even write to anything. As a percentage of the work actually done on real systems, how often are such commands executed? In any case, it *does* show that there is a very high penalty for the option relative to the operations that should be much more complex, like parsing the command. Why in the world is the penalty so high? I thought it would be a simple matter of copying the command to an element of a structure that's overlaid onto a bit of shared memory allocated to the backend process for its statistics. In short, a simple memory to memory copy, with perhaps the acquisition of a write lock on the structure. I'd expect such an operation to take a few microseconds at most (especially on the kind of hardware I was testing on), but it's orders of magnitude worse. I have trouble believing that the locking protocols required for this operation are that inefficient unless we're doing something drastically wrong on that front, and it's almost impossible for me to believe that the simple operation of copying data to a shared memory segment would be that inefficient. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] location of the configuration files
mlw wrote: > symlinks suck. Sorry Tom, but they are *BAD* in a production > server. Well, at least they're better than hard links. ;-) -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Hard problem with concurrency
Hm, odd, nobody mentioned this solution: If you don't have a primary key already, create a unique index on the combination you want to be unique. Then: . Try to insert the record . If you get a duplicate key error then do update instead No possibilities of duplicate records due to race conditions. If two people try to insert/update at the same time you'll only get one of the two results, but that's the downside of the general approach you've taken. It's a tad inefficient if the usual case is updates, but certainly not less efficient than doing table locks. I'm not sure what you're implementing here. Depending on what it is you might consider having a table of raw data that you _only_ insert into. Then you process those results into a table with the consolidated data you're trying to gather. I've usually found that's more flexible later because then you have all the raw data in the database even if you only present a limited view. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Hard problem with concurrency
> If you don't have a primary key already, create a unique index on the > combination you want to be unique. Then: > > . Try to insert the record > . If you get a duplicate key error > then do update instead > > No possibilities of duplicate records due to race conditions. If two people > try to insert/update at the same time you'll only get one of the two results, > but that's the downside of the general approach you've taken. It's a tad > inefficient if the usual case is updates, but certainly not less efficient > than doing table locks. The idea was to stop our postgres logs being spammed up with unique constraint violation warningsin which case your solution above is identical to our current one. Update and if it fails, insert, except since the row is likely to already be there - our current way will be a bit more efficient. Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] Questions about indexes?
Hello postgres hackers, Been a while since I have participated on this list ... but I have a new itch to scratch Although the table schema is immaterial, I will provide it so we have a common framework for this discussion: host_id integer (not null) timestamp datetime(not null) categorytext(not null) [<=5 chars] anomaly text(not null) [<= 1024 chars] This table is used to store archived data, so each row in the table must be unique. Currently I am using a primary key across each column to enforce this uniqueness. This table currently has ~86 million rows and is 16+ GB in size. This primary key index is also 16+ GB in size, because it appears all the data is duplicated in the index. (I have only done some preliminary looking at the database file with strings, etc ... so this assumption is purly based on these observations). I am not sure why all the data is duplicated in the index ... but i bet it has to do with performance since it would save a lookup in the main table. Is there any benchmarks or papers related to this topic I should locate and read? I am curious about this because it seems the only advantaged gained is searching the index for the specified values Once the entry is found, the full entry needs to be pulled from the main table anyhow since the index does not contain all the data. Also with the increased size, it seems additional pressure would be put on the shared memory caches (no idea how this really works, just guessing! :)) Since my only requirement is that the rows be unique, I have developed a custom MD5 function in C, and created an index on the MD5 hash of the concatanation of all the fields. This has reduced the disk space usage considerably, as show below against my test database ~6 million rows at 1+ GB. All this data is based off the test database running 7.3.2: TypeSize --- Database Table 1188642816 All columns pkey1510252544 MD5 columns pkey 370999296 Just using MD5 hash data instead of all the columns is a considerable diskspace win going from 1.5 GB to 370 MB. Has anyone else solved this problem? Has anyone else looked into something like this and mind sharing so I do not have to re-invent the wheel? :) Also (assuming there is no papers / benchmarks proving data in index is a good idea), how difficult would it be to impliment an index type that extracts the data from the main table? Thanks for reading. I will be happy to field any question that I can, or read any papers, research, etc that relates to this topic. - Ryan P.S. the production database is running 7.2.4 if that makes a difference. -- Ryan Bradetich <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Questions about indexes?
Ryan Bradetich <[EMAIL PROTECTED]> writes: > Although the table schema is immaterial, I will provide it so we have a > common framework for this discussion: > host_id integer (not null) > timestamp datetime(not null) > categorytext(not null) [<=5 chars] > anomaly text(not null) [<= 1024 chars] > This table is used to store archived data, so each row in the table must > be unique. Currently I am using a primary key across each column to > enforce this uniqueness. It's not real clear to me why you bother enforcing a constraint that the complete row be unique. Wouldn't a useful constraint be that the first three columns be unique? Even if that's not correct, what's wrong with tolerating a few duplicates? You can't tell me it's to save on storage ;-) > I am not sure why all the data is duplicated in the index ... but i bet > it has to do with performance since it would save a lookup in the main > table. An index that can't prevent looking into the main table wouldn't be worth anything AFAICS ... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] location of the configuration files
Tom Lane wrote: mlw <[EMAIL PROTECTED]> writes: The idea of using a "directory" puts us back to using symlinks to share files. So? If you want to share files, you're probably sharing all three config files and don't need a separate directory at all. This is not a sufficient argument to make me buy into the mess of letting people choose nonstandard configuration file names --- especially when most of the opposite camp seems to be more interested in choosing *standard* names for things. Why does that policy stop short at the directory name? symlinks suck. Sorry Tom, but they are *BAD* in a production server. You can not add comments to symlinks. Most of the admins I know, myself included, HATE symlinks and use them as a last resort. Requiring symlinks is just pointless, we are talking about a few lines of code hat has nothing to do with performance. The patch that I submitted allows PostgreSQL to work as it always has, but adds the ability for a configuration file to do what is normally done with fixed names in $PGDATA. I have said before, I do not like policy, I like flexibility, forcing a directory is similarly restricting as requiring the files in $PGDATA. Why is this such a problem? MANY people want to configure PostgreSQL this way, but the patch I submitted allows it, but does not force anything. Any configuration solution that requires symlinks is flawed. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Questions about indexes?
On Sun, 2003-02-16 at 23:34, Tom Lane wrote: > Ryan Bradetich <[EMAIL PROTECTED]> writes: > > Although the table schema is immaterial, I will provide it so we have a > > common framework for this discussion: > > > host_id integer (not null) > > timestamp datetime(not null) > > categorytext(not null) [<=5 chars] > > anomaly text(not null) [<= 1024 chars] > > > This table is used to store archived data, so each row in the table must > > be unique. Currently I am using a primary key across each column to > > enforce this uniqueness. > > It's not real clear to me why you bother enforcing a constraint that the > complete row be unique. Wouldn't a useful constraint be that the first > three columns be unique? Even if that's not correct, what's wrong with > tolerating a few duplicates? You can't tell me it's to save on storage > ;-) The table holds system policy compliance data. The catagory is basically the policy, and the anomaly is the detailed text explaining why the system is out of compliance. So the anomaly data is important (and often the reason why the key is unique). The reason we are archiving the data is to generate reports and graphs showing policy compliance over time. Duplicated rows will artifically inflate the numbers in the reports and graphs. The other option we had was to perform a DISTINCT select at report / graph time, we chose no to go this route bacause of the sort added to the query. (Also it just seemed tidier to only store good data :)) The disk storage is a minor concern :), but I was actually looking at it as a possible performance enhancement. I am curious how it affects the shared buffer cache, and also there should be less average pages to read since the index size was smaller. Does this make sense? Or am I out in left field again? :) > > I am not sure why all the data is duplicated in the index ... but i bet > > it has to do with performance since it would save a lookup in the main > > table. > > An index that can't prevent looking into the main table wouldn't be > worth anything AFAICS ... Ok, scratch that idea then :) I will continue looking at other ideas like the MD5 data hashing etc. Thanks for your input Tom! - Ryan regards, tom lane -- Ryan Bradetich <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Questions about indexes?
Ryan Bradetich <[EMAIL PROTECTED]> writes: > On Sun, 2003-02-16 at 23:34, Tom Lane wrote: >> It's not real clear to me why you bother enforcing a constraint that the >> complete row be unique. Wouldn't a useful constraint be that the first >> three columns be unique? > The table holds system policy compliance data. The catagory is > basically the policy, and the anomaly is the detailed text explaining > why the system is out of compliance. So the anomaly data is important > (and often the reason why the key is unique). Well, sure the anomaly is important: it's the payload, the reason why you bother to have the table in the first place. But that doesn't mean it's part of the key. Generally the key would be the info you use to look up a particular anomaly text. In this example, it's not clear to me why you'd need/want two different anomaly texts entered for the same host_id and the same category at the same instant of time. ISTM there's something inadequate about your category column if you need that. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Questions about indexes?
On Mon, 2003-02-17 at 00:15, Tom Lane wrote: > Ryan Bradetich <[EMAIL PROTECTED]> writes: > > On Sun, 2003-02-16 at 23:34, Tom Lane wrote: > >> It's not real clear to me why you bother enforcing a constraint that the > >> complete row be unique. Wouldn't a useful constraint be that the first > >> three columns be unique? > > > The table holds system policy compliance data. The catagory is > > basically the policy, and the anomaly is the detailed text explaining > > why the system is out of compliance. So the anomaly data is important > > (and often the reason why the key is unique). > > Well, sure the anomaly is important: it's the payload, the reason why > you bother to have the table in the first place. But that doesn't mean > it's part of the key. Generally the key would be the info you use to > look up a particular anomaly text. In this example, it's not clear to > me why you'd need/want two different anomaly texts entered for the same > host_id and the same category at the same instant of time. ISTM there's > something inadequate about your category column if you need that. Ok, I understand what you are asking now :) Let me make up a contrived example to show how the table is used. host_id 1 = hosta.somewhere.com host_id 2 = hostb.somewhere.com The catagories are coded so (made up examples): cat p101 = /etc/passwd check cat f101 = filesystem check. the table would look like: 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell. 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell. 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password. 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner. etc... So I do not need the anomaly to be part of the index, I only need it to I agree with you, that I would not normally add the anomally to the index, except for the unique row requirement. Thinking about it now, maybe I should guarentee unique rows via a check constraint... Thanks for making me think about this in a different way! - Ryan > regards, tom lane -- Ryan Bradetich <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])