Re: [HACKERS] There's random access and then there's random access
Gregory Stark wrote: > I could swear this has been discussed in the past too. I seem to recall Luke > disparaging Postgres on the same basis but proposing an immensely complicated > solution. posix_fadvise or using libaio in a simplistic fashion as a kind of > fadvise would be fairly lightweight way to get most of the benefit of the more > complex solutions. It has been on the TODO list for a long time: * Do async I/O for faster random read-ahead of data Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously. http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php I have added your thread URL to this. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]
"Trevor Talbot" <[EMAIL PROTECTED]> writes: > On 12/11/07, Tom Lane <[EMAIL PROTECTED]> wrote: >> I dunno anything about how to fix the real problem (what's winsock error >> 10004?), > WSAEINTR, "A blocking operation was interrupted by a call to > WSACancelBlockingCall." Oh, then it's exactly the same thing as our bug #2829. I opined in that thread that OpenSSL was broken because it failed to treat this as a retryable case like EINTR. But not being much of a Windows person, that might be mere hot air. Someone with a Windows build environment should try patching OpenSSL to treat WSAEINTR the same as Unix EINTR and see what happens ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]
On 12/11/07, Tom Lane <[EMAIL PROTECTED]> wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > I dunno anything about how to fix the real problem (what's winsock error > 10004?), but I don't think he'd be seeing full speed log filling in > 8.2.5. WSAEINTR, "A blocking operation was interrupted by a call to WSACancelBlockingCall." Offhand I'd take it as either not entirely sane usage of a network API, or one of the so very many broken software firewalls / network security products. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] VLDB Features
Greg, > I'm curious what you feel is missing that pgloader doesn't fill that > requirement: http://pgfoundry.org/projects/pgloader/ Because pgloader is implemented in middleware, it carries a very high overhead if you have bad rows. As little as 1% bad rows will slow down loading by 20% due to retries. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why my C function is called twice?
"Billow Gao" <[EMAIL PROTECTED]> writes: > A very simple C function which I copied from the manual. > And I found that it's called twice. You do realize that it's *supposed* to be called twice? Once to return the first row, and again to say it's done returning rows. But the info messages you show are strange anyway. I tried your example here and got the results I'd expect: regression=# select * from retcomposite(1,48); INFO: 1 INFO: 2 INFO: 3 INFO: 4 INFO: 2 INFO: 5 f1 | f2 | f3 ++- 48 | 96 | 144 (1 row) I think the code you showed must not quite match what you're actually executing. Maybe you recompiled the code and forgot to do a LOAD or start a fresh session to bring in the new .so file? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]
Alvaro Herrera <[EMAIL PROTECTED]> writes: > This guy is using 8.2.5. SSL seems to be able to fill his log files at > full speed. Are you *sure* the server is 8.2.5? 8.2.5 shouldn't emit duplicate messages, but 8.2.4 and before would: 2007-05-17 21:20 tgl * src/backend/libpq/: be-secure.c (REL7_4_STABLE), be-secure.c (REL8_1_STABLE), be-secure.c (REL8_0_STABLE), be-secure.c (REL8_2_STABLE), be-secure.c: Remove redundant logging of send failures when SSL is in use. While pqcomm.c had been taught not to do that ages ago, the SSL code was helpfully bleating anyway. Resolves some recent reports such as bug #3266; however the underlying cause of the related bug #2829 is still unclear. Furthermore, it looks to me like "SSL SYSCALL error: %m" doesn't exist anymore since that patch, so my bogometer is buzzing loudly. I dunno anything about how to fix the real problem (what's winsock error 10004?), but I don't think he'd be seeing full speed log filling in 8.2.5. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Problem of a server gettext message.
Hi. Yeah, As a part from which a problem happens, it is your suggestion. This is only the check. http://winpg.jp/~saito/pg83/message_check/gtext2.c Therefore, a message needed is acquirable in the next operation. gtext2 C UTF-8 http://winpg.jp/~saito/pg83/message_check/codeset_utf8_msg.txt gtext2 C EUC_JP http://winpg.jp/~saito/pg83/message_check/codeset_eucjp_msg.txt However, The check of accuracy is not settled yet. If all server encodings are possible, I will want to work. But but, It is not desirable that more encodings are intermingled as a log message Then, here is no still good method. Furthermore, a good solution plan is desired. probably.. Thanks! Regards, Hiroshi Saito - Original Message - From: "Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> > GetText is conversion po(EUC_JP) to SJIS. Yes. Are you sure about that? Why would gettext be converting to SJIS, when SJIS is nowhere in the environment it can see? gettext is using GetACP () on Windows, wherever that gets it's info from ... "chcp" did change the GetACP codepage in Hiroshi's example, but chcp does not reflect in LC_* Seems we may want to use bind_textdomain_codeset. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] [EMAIL PROTECTED]: Re: [pgsql-es-ayuda] SLL error 100% cpu]
Hi, Here's another problem report on Windows. This time it is usage of SSL connections and NOTIFY. I talked to Magnus on IRC and he directed me to bug #2829: http://archives.postgresql.org/pgsql-bugs/2006-12/msg00122.php This report seems to be a little different, if only because the reported error string from SSL mentions an "Unknown winsock error 10004". This guy is using 8.2.5. SSL seems to be able to fill his log files at full speed. Is this an issue we can do something about? - Forwarded message from Henry <[EMAIL PROTECTED]> - From: Henry <[EMAIL PROTECTED]> To: Alvaro Herrera <[EMAIL PROTECTED]> Cc: Postgres <[EMAIL PROTECTED]> Date: Wed, 12 Dec 2007 03:34:04 +0100 (CET) Subject: Re: [pgsql-es-ayuda] SLL error 100% cpu Message-ID: <[EMAIL PROTECTED]> --- Alvaro Herrera <[EMAIL PROTECTED]> escribió: > Henry escribió: > > buenas a todos los listeros. > > > > ya puse a produccion SSL con postgresql, y la > > performance se va degradando mientras se va > usando, > > procesos de CPU ocupa el 100% y cuando bajo el > > Servicio quedan alguno postgres.exe colgados, > > desactive la escritura de Log, porque se creaban > > demasiados archivos log con el texto de SYSCALL > > ERROR... , que raro pero hasta se creo > un > > archivo de 14MB (ke raro, si esta configurado > hasta > > 10MB solamente). - > Puedes mandar un extracto de ese archivo gigante? > Unas cuantas lineas > de ese SYSCALL ERROR. -- aqui esta: LOG: SSL SYSCALL error: Unknown winsock error 10004 saludos __ ¿Chef por primera vez? Sé un mejor Cocinillas. http://es.answers.yahoo.com/info/welcome - End forwarded message - -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC It does it in a really, really complicated way why does it need to be complicated? Because it's MakeMaker. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] archive_command failures report confusing exit status
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Simon Riggs wrote: >> I think you should lose the "The", if nothing else. Most Postgres >> messages I recall say "return code = xx" not "The return code ..." > Right, that's because they are in errmessage() and not errdetail(). The > guidelines are different. Right --- errdetail is supposed to be a complete sentence. (The rationale for that is explained in TFM.) I do find the current phrasing a bit awkward, though; what's bugging me is putting quotes around the command string. It seems highly likely that the command string will itself contain quotes and thus that the added quotes will be confusing. I wonder if it'd be OK to do The failed archive command was: %s which is stretching the style guidelines by omitting a trailing period, but I think that might be justifiable in this context. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] archive_command failures report confusing exit status
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Yes, I had missed that difference. Next try ... Looks sane to me. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] 8.3beta4 space reduction
I'm still working on getting through our test cases and benchmarks, but one thing I can say for sure -- a freshly loaded database under 8.3beta4 is over 13% smaller than the exact same database freshly loaded into 8.2.5. It went from 216GB to 187GB. Of course, the down side of this is that it'll be that much longer before we can brag about having a TB database -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] VLDB Features
On Tue, 11 Dec 2007, Josh Berkus wrote: Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't know anyone who's working on it. I'm curious what you feel is missing that pgloader doesn't fill that requirement: http://pgfoundry.org/projects/pgloader/ -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] whats the deal with -u ?
>>> On Sun, Dec 9, 2007 at 6:37 PM, in message <[EMAIL PROTECTED]>, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > I have never understood what's the point of having an option to force a > password prompt. I wonder why don't we deprecate -W? I occasionally find it useful for situations where I have a .pgpass entry which would normally cover a database, but I have temporarily changed the password to prevent conflicting usage during maintenance or testing. (For example, while "borrowing" a machine which is normally part of the production load for a series of benchmarks under the beta release.) There would be other ways to deal with it if this were gone, but it is convenient. -Kevin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] VLDB Features
On Tue, 2007-12-11 at 15:31 -0800, Josh Berkus wrote: > Here's the other VLDB features we're missing: > > Parallel Query > Windowing Functions > Parallel Index Build (not sure how this works exactly, but it speeds Oracle > up considerably) > On-disk Bitmap Index (anyone game to finish GP patch?) I would call those VLDB Data Warehousing features to differentiate between that and the use of VLDBs for other purposes. I'd add Materialized View support in the planner, as well as saying its more important than parallel query, IMHO. MVs are to DW what indexes are to OLTP. It's the same as indexes vs. seqscan; you can speed up the seq scan or you can avoid it. Brute force is cool, but being smarter is even better. The reason they don't normally show up high on anybody's feature list is that the TPC benchmarks specifically disallow them, which as I once observed is very good support for them being a useful feature in practice. (Oracle originally brought out MV support as a way of improving their TPC scores at a time when Teradata was wiping the floor with parallel query implementation). -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] VLDB Features
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. I actually had to cook up a version of this for Truviso recently. I'll take a look at submitting a cleaned-up implementation for 8.4. -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] VLDB Features
Hannu, > COPY ... WITH ERRORS TO ... Yeah, that's a start. > or something more advanced, like bulkload which can be continued after > crash ? Well, we could also use a loader which automatically parallelized, but that functionality can be done at the middleware level. WITH ERRORS is the most critical part. Here's the other VLDB features we're missing: Parallel Query Windowing Functions Parallel Index Build (not sure how this works exactly, but it speeds Oracle up considerably) On-disk Bitmap Index (anyone game to finish GP patch?) Simon, we should start a VLDB-Postgres developer wiki page. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] archive_command failures report confusing exit status
Simon Riggs wrote: > On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote: > > Simon Riggs wrote: > > > I prefer archive_command = ... for the errdetail though. The commands > > > can be quite long so the extra words don't really add anything, plus > > > they require translation. > > > > I did like this suggestion, but then I noticed, we don't actually report > > the > > setting of the archive_command setting but the actual command that was > > executed, with the placeholders filled out. So I think the way I posted it > > is more correct. > > I think you should lose the "The", if nothing else. Most Postgres > messages I recall say "return code = xx" not "The return code ..." Right, that's because they are in errmessage() and not errdetail(). The guidelines are different. (The reason they are different really escapes me, but you have to note that (1) Peter invented them, and (2) this grammatical difference always make sense when you read the logs.) -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] archive_command failures report confusing exit status
On Tue, 2007-12-11 at 23:31 +0100, Peter Eisentraut wrote: > Simon Riggs wrote: > > I prefer archive_command = ... for the errdetail though. The commands > > can be quite long so the extra words don't really add anything, plus > > they require translation. > > I did like this suggestion, but then I noticed, we don't actually report the > setting of the archive_command setting but the actual command that was > executed, with the placeholders filled out. So I think the way I posted it > is more correct. I think you should lose the "The", if nothing else. Most Postgres messages I recall say "return code = xx" not "The return code ..." -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 14:25 -0500, Andrew Sullivan wrote: > On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote: > > > > Read-Only Tables > > > > In the past when this topic came up, there was some discussion of doing this > at a level somewhere below the table horizon. There are a number of nasty > limitations for partitions currently (not the least of which is that real > uniqueness guarantees are impractical), so allowing users to specify some > segment of the table to be "read only" without imposing it on the whole > table would be awful nice. I seem to recall Jan had an idea on how to do > it, but I could be wrong. > > Also, doing this at the tuple, rather than table-wide, level might lead to > additional capabilities in this area: Seems fair comment. I'll move forward my thoughts on partitioning, so we can decide whether to do things this way, or below the table horizon as you say. I've got some detailed notes on this already which showed it was roughly balanced between the two ways. I'll write it up so we can see; the answer might change during the writing. > > Attach > > -- > > > > Writing tables on one system and then moving that data to other systems > > is fairly common. If we supported read-only tables then you might > > consider how you would publish new versions to people. > > Some time ago I was speculating on pie-in-the-sky features I might like in > Postgres, and it was something like this attach. But the idea was somehow > related to the read-only tuples. > > In my specific case, I have piles and piles of mostly useless data. > Sometimes, however, some of that data is possibly useful in retrospect. So > the suggestion was to have tables that could be mostly offline -- archived > somewhere -- but for which we had enough metadata online to say, "You have > some data that might match in catalog C. Go mount it, and I'll check." I > think this is subtly different from the attach case you're outlining? Yes it is, but I had hoped that what you're asking for is catered for here. If you have a hierarchical storage manager, you can just call access the file, whereupon the actual file will be de-archived to allow access. Or maybe you have it on MAID storage, so we spin up the disks to allow access to the files. So I guess I was expecting the de-archive to be automated at the file system level. http://en.wikipedia.org/wiki/Hierarchical_Storage_Management Regrettably, I'm not aware of an open source HSM, though XFS has some hooks in it that mention this. That's an old IBM term, so some people might refer to this concept as tiered storage. Since I was planning to modify smgr to allow different kinds of tablespaces it should be possible to make the file issue some kind of a call out to mount the appropriate files. What would that call out look like? Would that be a DMAPI/XDSM impelementation, or something simpler as was used for PITR, or a roll-your-own plug-in hook? I can see I'll have to re-wire smgr_nblocks() for non-md smgrs to access pg_class.relpages rather than issue a seek, which will need to issue an open. That way we can do planning without actually accessing the table. (Maybe that's presuming we dealing with tables, oh well). -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] archive_command failures report confusing exit status
Tom Lane wrote: > No, you are confusing the cases "called shell was killed by a signal" > and "called command was killed by a signal, which the shell then turned > around and reported to us as exit > 128". Yes, I had missed that difference. Next try ... -- Peter Eisentraut http://developer.postgresql.org/~petere/ diff -cr ../cvs-pgsql/src/backend/postmaster/pgarch.c ./src/backend/postmaster/pgarch.c *** ../cvs-pgsql/src/backend/postmaster/pgarch.c 2007-11-25 12:39:56.0 +0100 --- ./src/backend/postmaster/pgarch.c 2007-12-11 23:28:43.0 +0100 *** *** 484,494 * Per the Single Unix Spec, shells report exit status > 128 when a * called command died on a signal. */ ! bool signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128; ! ereport(signaled ? FATAL : LOG, ! (errmsg("archive command \"%s\" failed: return code %d", ! xlogarchcmd, rc))); return false; } --- 484,518 * Per the Single Unix Spec, shells report exit status > 128 when a * called command died on a signal. */ ! int lev = (WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128) ? FATAL : LOG; ! if (WIFEXITED(rc)) ! { ! ereport(lev, ! (errmsg("archive command failed with exit code %d", WEXITSTATUS(rc)), ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } ! else if (WIFSIGNALED(rc)) ! { ! ereport(lev, ( ! #if defined(WIN32) ! errmsg("archive command was terminated by exception 0x%X", WTERMSIG(rc)), ! errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."), ! #elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST ! errmsg("archive command was terminated by signal %d: %s", ! WTERMSIG(rc), ! WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"), ! #else ! errmsg("archive command was terminated by signal %d", WTERMSIG(exitstatus)), ! #endif ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } ! else ! { ! ereport(lev, ! (errmsg("archive command exited with unrecognized status %d", rc), ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } return false; } ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
printQuery API change proposal (was Re: [HACKERS] psql \dFp's behavior)
I wrote: > describe.c's whole approach to this has always been pretty thoroughly > broken in my mind, because it makes untenable assumptions about the > client-side gettext() producing strings that are in the current > client_encoding. If they are not, the server will probably reject > the SQL query as failing encoding verification. > We should be fixing it so that the translated strings never go to the > server and back at all. This doesn't seem amazingly hard for column > headings --- it'd take some API additions in print.c, I think. > If we are actually embedding translated words in the data > then it'd be a bigger problem. I looked at the code a bit closer, and my vague memory was correct: describe.c mostly uses translated strings for column headers, eg printfPQExpBuffer(&buf, "SELECT spcname AS \"%s\",\n" " pg_catalog.pg_get_userbyid(spcowner) AS \"%s\",\n" " spclocation AS \"%s\"", _("Name"), _("Owner"), _("Location")); but there are also a few places where it wants a column to contain translated values, for example " CAST(\n" "CASE c.relkind WHEN 'r' THEN '%s' WHEN 'v' THEN '%s' WHEN 'i' THEN '%s' WHEN 'S' THEN '%s' END" " AS pg_catalog.text) as object\n" ... _("table"), _("view"), _("index"), _("sequence") It would be reasonably straightforward to get rid of sending the column headers to the server, since the underlying printTable function already accepts column headers as a separate array argument; we could ignore the column headers coming back from the server and just inject correctly translated strings instead. However the data values are a bit harder. What I'm tempted to do is add a couple of optional fields to struct printQueryOpt that specify translatable strings in column headers and column contents, respectively: booltranslate_headers; bool *translate_columns; /* translate_columns[i-1] applies to column i */ If these are set then printQuery would run through the headers and/or contents of specific columns and apply gettext() on the indicated strings, after it had finished disassembling the PGresult into arrays. (Since we don't want to be doing gettext() on random strings, we need to indicate exactly which columns should be processed.) To ensure that the strings are available for translation, all the _("x") instances in describe.c would change to gettext_noop("x"), but otherwise that code would only need to change to the extent of setting the new option fields in printQueryOpt. This means the server sees only untranslated plain-ASCII strings and shouldn't get upset about encoding issues. We'd still have a problem if we wanted to put a single-quote mark in an untranslated string (or a double-quote, in the case of a column header), but so far there's been no need for that. If it did come up, we could handle it in the style Guillaume suggested, that is appendStringLiteral(gettext_noop("foo's a problem")). So I think it's not necessary to contort the general solution to make that case easier. printQueryOpt isn't exported anywhere but bin/psql and bin/scripts, so changing it doesn't create an ABI break. Objections, better ideas? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] archive_command failures report confusing exit status
Simon Riggs wrote: > I prefer archive_command = ... for the errdetail though. The commands > can be quite long so the extra words don't really add anything, plus > they require translation. I did like this suggestion, but then I noticed, we don't actually report the setting of the archive_command setting but the actual command that was executed, with the placeholders filled out. So I think the way I posted it is more correct. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] psql \dFp's behavior
Guillaume Lelarge <[EMAIL PROTECTED]> writes: > Tom Lane a écrit : >> We should be fixing it so that the translated strings never go to the >> server and back at all. This doesn't seem amazingly hard for column >> headings --- it'd take some API additions in print.c, I think. > I'll take a look at this. I'm already looking ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] psql \dFp's behavior
Tom Lane a écrit : > Guillaume Lelarge <[EMAIL PROTECTED]> writes: >> Tom Lane a écrit : >>> This seems mighty ugly, and it's not the way we handle any other \d >>> command. Why is it needed for \dFp (and only that)? > >> The problem here is that "Start parse" is translated with "Début de >> l'analyse" (which is a bad translation but that's not the point). > > Well, that particular issue could be fixed if the translated string > doubled the quote mark. Which I agree is a pretty fragile solution. > Which is why I choose to look at the code and write a patch. > describe.c's whole approach to this has always been pretty thoroughly > broken in my mind, because it makes untenable assumptions about the > client-side gettext() producing strings that are in the current > client_encoding. If they are not, the server will probably reject > the SQL query as failing encoding verification. > Oh, that's true. I didn't think about that but I understand your concerns. > We should be fixing it so that the translated strings never go to the > server and back at all. This doesn't seem amazingly hard for column > headings --- it'd take some API additions in print.c, I think. > If we are actually embedding translated words in the data > then it'd be a bigger problem. > I'll take a look at this. Thanks for your answer. Regards. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] VLDB Features
On Tue, 2007-12-11 at 10:53 -0800, Josh Berkus wrote: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. Unfortunately, I don't know anyone > who's working on it. Not lost sight of it; I have a design, but I have to prioritise also. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP
> -Original Message- > From: Alvaro Herrera [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 11, 2007 1:11 PM > To: Dann Corbit > Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED] > Subject: Re: [GENERAL] Dumb question about binary cursors and > #ifdefHAVE_INT64_TIMESTAMP > > Dann Corbit wrote: > > If I create a binary cursor on a recent version of PostgreSQL, how can I > > tell if the timestamp data internally is an 8 byte double or an 8 byte > > integer? > > > > I see an #ifdef that changes the code path to compute timestamps as one > > type or the other, but I do not know how to recognize the internal > > format of the type that will be returned in a binary cursor. > > > > How can I do that? > > SHOW integer_timestamp; > > (actually, IIRC, this is one of the params that the server will send you > at session start). Tom's post clued me in. It's: show integer_datetimes; Or (in my case): PQparameterStatus(conn, "integer_datetimes") ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP
"Dann Corbit" <[EMAIL PROTECTED]> writes: > If I create a binary cursor on a recent version of PostgreSQL, how can I > tell if the timestamp data internally is an 8 byte double or an 8 byte > integer? PQparameterStatus(conn, "integer_datetimes") regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Re: [GENERAL] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP
Dann Corbit wrote: > If I create a binary cursor on a recent version of PostgreSQL, how can I > tell if the timestamp data internally is an 8 byte double or an 8 byte > integer? > > I see an #ifdef that changes the code path to compute timestamps as one > type or the other, but I do not know how to recognize the internal > format of the type that will be returned in a binary cursor. > > How can I do that? SHOW integer_timestamp; (actually, IIRC, this is one of the params that the server will send you at session start). -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "Y eso te lo doy firmado con mis lágrimas" (Fiebre del Loco) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP
> -Original Message- > From: Alvaro Herrera [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 11, 2007 1:11 PM > To: Dann Corbit > Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED] > Subject: Re: [GENERAL] Dumb question about binary cursors and > #ifdefHAVE_INT64_TIMESTAMP > > Dann Corbit wrote: > > If I create a binary cursor on a recent version of PostgreSQL, how can I > > tell if the timestamp data internally is an 8 byte double or an 8 byte > > integer? > > > > I see an #ifdef that changes the code path to compute timestamps as one > > type or the other, but I do not know how to recognize the internal > > format of the type that will be returned in a binary cursor. > > > > How can I do that? > > SHOW integer_timestamp; > > (actually, IIRC, this is one of the params that the server will send you > at session start). I guess that I am supposed to check for error on the statement? What does it look like when the query works? This is what I get against PostgreSQL 8.2.5 using PG Admin III query tool: ERROR: unrecognized configuration parameter "integer_timestamp" ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] VLDB Features
Ühel kenal päeval, T, 2007-12-11 kell 10:53, kirjutas Josh Berkus: > Simon. > > > VLDB Features I'm expecting to work on are > > - Read Only Tables/WORM tables > > - Advanced Partitioning > > - Compression > > plus related performance features > > Just so you don't lose sight of it, one of the biggest VLDB features we're > missing is fault-tolerant bulk load. What do you mean by fault-tolerant here ? Just COPY ... WITH ERRORS TO ... or something more advanced, like bulkload which can be continued after crash ? -- Hannu ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP
If I create a binary cursor on a recent version of PostgreSQL, how can I tell if the timestamp data internally is an 8 byte double or an 8 byte integer? I see an #ifdef that changes the code path to compute timestamps as one type or the other, but I do not know how to recognize the internal format of the type that will be returned in a binary cursor. How can I do that?
Re: [HACKERS] psql \dFp's behavior
Guillaume Lelarge <[EMAIL PROTECTED]> writes: > Tom Lane a écrit : >> This seems mighty ugly, and it's not the way we handle any other \d >> command. Why is it needed for \dFp (and only that)? > The problem here is that "Start parse" is translated with "Début de > l'analyse" (which is a bad translation but that's not the point). Well, that particular issue could be fixed if the translated string doubled the quote mark. Which I agree is a pretty fragile solution. describe.c's whole approach to this has always been pretty thoroughly broken in my mind, because it makes untenable assumptions about the client-side gettext() producing strings that are in the current client_encoding. If they are not, the server will probably reject the SQL query as failing encoding verification. We should be fixing it so that the translated strings never go to the server and back at all. This doesn't seem amazingly hard for column headings --- it'd take some API additions in print.c, I think. If we are actually embedding translated words in the data then it'd be a bigger problem. > I found \dFp but we could have the same problems with \dp IIRC, *all* the \d commands do this. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] psql \dFp's behavior
Tom Lane a écrit : > Guillaume Lelarge <[EMAIL PROTECTED]> writes: >> I'm not sure psql handles \dFp the right way. The query allows >> translators to translate some columns' values but forgets to escape the >> strings. So, here is a patch that escapes these translated strings. > > This seems mighty ugly, and it's not the way we handle any other \d > command. Why is it needed for \dFp (and only that)? > Oh I didn't say only \dFp needs this kind of fix. I've found an issue with \dFp+ today, so I'm trying to fix it. Here is the issue I found : [EMAIL PROTECTED]:/opt/postgresql-head$ LANG=en psql postgres Welcome to psql 8.3beta4, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit Text search parser "pg_catalog.default" Method |Function| Description -++- Start parse | prsd_start | Get next token | prsd_nexttoken | End parse | prsd_end | Get headline| prsd_headline | Get token types | prsd_lextype | Token types for parser "pg_catalog.default" [...] OK, it works great. Now, in french : [EMAIL PROTECTED]:/opt/postgresql-head$ LANG=fr_FR.UTF-8 psql postgres Bienvenue dans psql 8.3beta4, l'interface interactive de PostgreSQL. Saisissez: \copyright pour les termes de distribution \h pour l'aide-mémoire des commandes SQL \? pour l'aide-mémoire des commandes psql \g ou point-virgule en fin d'instruction pour exécuter la requête \q pour quitter postgres=# set lc_messages to 'C'; SET postgres=# \dFp+ ERROR: syntax error at or near "analyse" LIGNE 1 : SELECT 'Début de l'analyse' AS "Méthode", ^ The problem here is that "Start parse" is translated with "Début de l'analyse" (which is a bad translation but that's not the point). The point is that the query is not protected against quotes and backslashes and this is bad. My code is probably ugly, I trust you on this, but I think we need to fix this. I think we have two ways to do it : - escaping the translated words ; - removing the call to gettext (so no translations for these strings). I found \dFp but we could have the same problems with \dp because it puts directly in the query the translations of some words (table, view, sequence, aggregate, function, operator, datatype, rule, trigger) which is not a problem in French but can be one in another language. \du, \dg, \d seem to have problems too. Regards. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] psql \dFp's behavior
Guillaume Lelarge <[EMAIL PROTECTED]> writes: > I'm not sure psql handles \dFp the right way. The query allows > translators to translate some columns' values but forgets to escape the > strings. So, here is a patch that escapes these translated strings. This seems mighty ugly, and it's not the way we handle any other \d command. Why is it needed for \dFp (and only that)? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] archive_command failures report confusing exit status
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Dienstag, 11. Dezember 2007 schrieb Tom Lane: >> Doesn't this patch break the behavior that is documented in the comment? >> Specifically, the case where the restore_command dies on a signal and >> this is reported to us by the controlling shell as exitcode > 128. >> We want the archiver to die, but this patch makes it not do so. > AFAICT, the coding > WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128 > is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) > is > true. No, you are confusing the cases "called shell was killed by a signal" and "called command was killed by a signal, which the shell then turned around and reported to us as exit > 128". > I have tested this: A single test case proves little. You need to consider race conditions and cases where the shell is ignoring the particular signal. I'm fairly certain that both of these exit statuses can happen in practice, at least with some shells/platforms. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, Dec 11, 2007 at 11:12:46AM +, Simon Riggs wrote: > > Read-Only Tables > In the past when this topic came up, there was some discussion of doing this at a level somewhere below the table horizon. There are a number of nasty limitations for partitions currently (not the least of which is that real uniqueness guarantees are impractical), so allowing users to specify some segment of the table to be "read only" without imposing it on the whole table would be awful nice. I seem to recall Jan had an idea on how to do it, but I could be wrong. Also, doing this at the tuple, rather than table-wide, level might lead to additional capabilities in this area: > Attach > -- > > Writing tables on one system and then moving that data to other systems > is fairly common. If we supported read-only tables then you might > consider how you would publish new versions to people. Some time ago I was speculating on pie-in-the-sky features I might like in Postgres, and it was something like this attach. But the idea was somehow related to the read-only tuples. In my specific case, I have piles and piles of mostly useless data. Sometimes, however, some of that data is possibly useful in retrospect. So the suggestion was to have tables that could be mostly offline -- archived somewhere -- but for which we had enough metadata online to say, "You have some data that might match in catalog C. Go mount it, and I'll check." I think this is subtly different from the attach case you're outlining? A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PGparam proposal
> your proposal completely glossed over the > issue of exactly what data structure would be exposed to clients for > anything more complex than an integer Yeah. Forgot to include the below in the proposal and the last email. Here is the lastest list for complex types. Most have been around since the last 0.5 patch. Still need numeric, date, time and a couple others. typedef struct { double x; double y; } PGpoint; typedef struct { PGpoint pts[2]; } PGlseg; typedef struct { PGpoint high; PGpoint low; } PGbox; typedef struct { PGpoint center; double radius; } PGcircle; /* When used with PQgetf, 'pts' must be freed with PQfreemem(). */ typedef struct { int npts; int closed; PGpoint *pts; } PGpath; /* When used with PQgetf, 'pts' must be freed with PQfreemem(). */ typedef struct { int npts; PGpoint *pts; } PGpolygon; /* This struct works with CIDR as well. */ typedef struct { /* here for convenience, value the same as first 2 bytes of sa_buf */ unsigned short sa_family; /* mask, ie. /24 */ int mask; int is_cidr; /* Cast to: sockaddr, sockaddr_in, sockaddr_in6, sockaddr_stroage */ int sa_len; char sa_buf[128]; } PGinet; typedef struct { int a; int b; int c; int d; int e; int f; } PGmacaddr; /* main problem with this type is that it can be a double * or int64 and that is a compile-time decision. This means * the client has a 50% chance of getting it wrong. It would * be nice if the server included an indicater or converted * the external format to one or the other. OR, make client * aware of server's timestamp encoding when it connects. */ typedef struct { /* When non-zero, this is a TIMESTAMP WITH TIME ZONE. When zero, * this is a TIMESTAMP WITHOUT TIME ZONE. When WITHOUT, gmtoff * will always be 0 and tzn will be "GMT". */ int withtz; /* binary timestamp from server (in host order). If haveint64 is * non-zero, use the 'ts.asint64' value otherwise use 'ts.asdouble'. */ int haveint64; union { struct { unsigned int a; signed int b; } asint64; double asdouble; } ts; /* microseconds */ int usec; /* GMT offset, some systems don't have this in struct tm */ int gmtoff; /* time zone name, some systems don't have this in struct tm */ char *tzn; /* broken-down time */ struct tm tm; } PGtimestamp; /* still working on this, may need access macros */ typedef struct { int dim; int lbound; } PGarraydim; typedef struct { int len; /* already in PGtype format. For instance: * (PGpolygon *)arr->items[i].data * or * printf("text=%s\n", arr->items[i].data); * * Could have a union of all types here but that * doesn't help much for 3rd party types. */ char *data; } PGarrayitem; typedef struct { Oid oid; /* type of items[].data */ int ndim; PGarraydim dims[MAXDIM]; int nitems; PGarrayitem *items; } PGarray; andrew & merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] VLDB Features
Simon. > VLDB Features I'm expecting to work on are > - Read Only Tables/WORM tables > - Advanced Partitioning > - Compression > plus related performance features Just so you don't lose sight of it, one of the biggest VLDB features we're missing is fault-tolerant bulk load. Unfortunately, I don't know anyone who's working on it. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] psql \dFp's behavior
Hi all, I'm not sure psql handles \dFp the right way. The query allows translators to translate some columns' values but forgets to escape the strings. So, here is a patch that escapes these translated strings. Regards. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com Index: src/bin/psql/describe.c === RCS file: /projects/cvsroot/pgsql/src/bin/psql/describe.c,v retrieving revision 1.162 diff -c -r1.162 describe.c *** src/bin/psql/describe.c 15 Nov 2007 21:14:42 - 1.162 --- src/bin/psql/describe.c 11 Dec 2007 18:04:09 - *** *** 2069,2112 printQueryOpt myopt = pset.popt; initPQExpBuffer(&buf); ! ! printfPQExpBuffer(&buf, ! "SELECT '%s' AS \"%s\", \n" " p.prsstart::pg_catalog.regproc AS \"%s\", \n" ! " pg_catalog.obj_description(p.prsstart, 'pg_proc') as \"%s\" \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT '%s', \n" " p.prstoken::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prstoken, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT '%s', \n" " p.prsend::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prsend, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT '%s', \n" " p.prsheadline::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prsheadline, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT '%s', \n" " p.prslextype::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prslextype, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" ! " WHERE p.oid = '%s' \n", ! _("Start parse"), ! _("Method"), _("Function"), _("Description"), ! oid, ! _("Get next token"), oid, ! _("End parse"), oid, ! _("Get headline"), oid, ! _("Get token types"), oid ); res = PSQLexec(buf.data, false); --- 2069,2114 printQueryOpt myopt = pset.popt; initPQExpBuffer(&buf); ! appendPQExpBuffer(&buf, "SELECT "); ! appendStringLiteralConn(&buf, _("Start parse"), pset.db); ! appendPQExpBuffer(&buf, " AS \"%s\", \n" " p.prsstart::pg_catalog.regproc AS \"%s\", \n" ! " pg_catalog.obj_description(p.prsstart, 'pg_proc') AS \"%s\" \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT ", ! _("Method"), _("Function"), _("Description"), oid); ! appendStringLiteralConn(&buf, _("Get next token"), pset.db); ! appendPQExpBuffer(&buf, ", \n" " p.prstoken::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prstoken, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT ", oid); ! appendStringLiteralConn(&buf, _("End parse"), pset.db); ! appendPQExpBuffer(&buf, ", \n" " p.prsend::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prsend, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT ", oid); ! appendStringLiteralConn(&buf, _("Get headline"), pset.db); ! appendPQExpBuffer(&buf, ", \n" " p.prsheadline::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prsheadline, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" " WHERE p.oid = '%s' \n" "UNION ALL \n" ! "SELECT ", oid); ! appendStringLiteralConn(&buf, _("Get token types"), pset.db); ! appendPQExpBuffer(&buf, ", \n" " p.prslextype::pg_catalog.regproc, \n" " pg_catalog.obj_description(p.prslextype, 'pg_proc') \n" " FROM pg_catalog.pg_ts_parser p \n" ! " WHERE p.oid = '%s' \n", oid ); res = PSQLexec(buf.data, false); ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PGparam proposal
>library users would have no idea whether > the values they provide are being sent as text or binary. The putf interface currently abstracts how it actually sends it. Although, you do put a C type rather than a string. There is a putstr "%pqtypstr", which puts the string representation of a type. > providing an easier-to-use > API for PQexecParams and friends, and what parts are actually > interested in binary data transmission (and why) There are two things happening here and two things we are trying to solve: 1. putf and PGparam - simpler interface for executing queries using parameterized functions. 2. getf - doesn't use PGparam at all. You pass it a PGresult. The goal of this is to translate either text or binary results into a uniform C type or structure. For instance, we expose the below structure for inet. /* This struct works with CIDR as well. */ typedef struct { /* convenience, value the same as first 2 bytes of sa_buf */ unsigned short sa_family; /* mask, ie. /24 */ int mask; int is_cidr; /* Cast to: sockaddr, sockaddr_in, sockaddr_in6, sockaddr_stroage */ int sa_len; char sa_buf[128]; /* avoid referencing sockaddr structs */ } PGinet; // "res" could be text or binary results, but PGinet remains the same. PGinet inet; PGgetf(res, tup_num, "%pginet", field_num &inet); connect(sock, (const struct sockaddr *)inet.sa_buf, (socklen_t)inet.sa_len); The above is simpler than examining "10.0.0.1/32" and converting it to a struct sockaddr. The same struct is used when putting a type as a "C type" rather than as a string. You can use getf without ever using putf+PGparam, and vise-versa. > your proposal completely glossed over the > issue of exactly what data structure would be exposed to clients for > anything more complex than an integer Complex types require a receiving structure on the client side, thus the types we added to libpq-fe.h: PGinet, PGpolygon, PGarray, PGtimestamp, etc... But keep in mind that these structs are the result of libpq extracting the data from text/binary formats and assigning data to struct members. It does not expose raw format, the API user can already get that via PQgetvalue(). > If we could have libpq insulate client apps from these kinds > of changes, that would be one thing; This is the goal of getf. The API user interfaces through PGinet, not through the output of PQgetvalue(). We propose that its libpq's job internally to handle changes to text or binary formats and expose consistent types/structures. > type NUMERIC (say, as a bignum integer plus > exponent instead of the current BCD-ish format) This is what we want to hide inside libpq's getf. Just expose a PGnumeric that has been translated into c types. We never expose the wire format, only the C translated version of it. > without any reliance on binary data transmission whatsoever. Yes this is possible, but at a performance & ease-of-use cost. Our performance tests didn't show much gain with strings or ints, but complex types were 10 times faster (putting/getting arrays, polygons, paths, etc...). So, the trade-off is a little more maintainence overhead on libpq. BTW, previously we mentioned a 3rd party handler api concept. This is not needed to get libpq up and going with built-in types (which is all we feel it should be responsible for). the handler API can always be added later w/o changing existing functions ... have to add a couple though. andrew & merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] archive_command failures report confusing exit status
Am Dienstag, 11. Dezember 2007 schrieb Tom Lane: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > Better patch. > > Doesn't this patch break the behavior that is documented in the comment? > Specifically, the case where the restore_command dies on a signal and > this is reported to us by the controlling shell as exitcode > 128. > We want the archiver to die, but this patch makes it not do so. AFAICT, the coding WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128 is simply redundant, because a signal happened exactly when WIFSIGNALED(rc) is true. I have tested this: LOG: database system was shut down at 2007-12-11 17:15:43 CET LOG: autovacuum launcher started LOG: database system is ready to accept connections FATAL: archive command was terminated by signal 1: Hangup DETAIL: The archive command was "kill -1 $$". LOG: archiver process (PID 22572) exited with exit code 1 -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] archive_command failures report confusing exit status
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Better patch. Doesn't this patch break the behavior that is documented in the comment? Specifically, the case where the restore_command dies on a signal and this is reported to us by the controlling shell as exitcode > 128. We want the archiver to die, but this patch makes it not do so. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] archive_command failures report confusing exit status
On Tue, 2007-12-11 at 17:27 +0100, Peter Eisentraut wrote: > Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera: > > Peter Eisentraut wrote: > > > I figured it would make sense if pgarch.c used the same mechanism that > > > postmaster.c uses to report the various variants of regular and signal > > > exits. > > > > Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for > > translations that I think it is worth pgarch.c having its own routine > > for this. Furthermore I think the detailed archive command should be > > reported in an errdetail() field, which makes it even farther off. > > Better patch. Looks much easier to understand. I prefer archive_command = ... for the errdetail though. The commands can be quite long so the extra words don't really add anything, plus they require translation. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Why my C function is called twice?
A very simple C function which I copied from the manual. And I found that it's called twice. Even in the function: if (SRF_IS_FIRSTCALL()) { ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("1"))); } An example output. You will find two INFO: 1 there.. Why a function is called twice and how to prevent this problem? Thanks Billow test=# select * from retcomposite(1,48); INFO: 1 INFO: 2 INFO: 3 INFO: 1 INFO: 4 INFO: 2 INFO: 2 INFO: 5 INFO: 3 INFO: 4 INFO: 2 INFO: 5 f1 | f2 | f3 ++- 48 | 96 | 144 (1 row) === Code.. === /* CREATE OR REPLACE FUNCTION retcomposite(integer, integer) RETURNS SETOF __retcomposite AS 'test.so', 'retcomposite' LANGUAGE C IMMUTABLE STRICT; */ // PostgreSQL includes #include "postgres.h" #include "fmgr.h" // Tuple building functions and macros #include "access/heapam.h" #include "funcapi.h" #include "utils/builtins.h" #include #include #include #include #define _textout(str) DatumGetPointer(DirectFunctionCall1(textout, PointerGetDatum(str))) #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif PG_FUNCTION_INFO_V1(retcomposite); Datum retcomposite(PG_FUNCTION_ARGS) { FuncCallContext *funcctx; int call_cntr; int max_calls; TupleDesc tupdesc; AttInMetadata *attinmeta; /* stuff done only on the first call of the function */ if (SRF_IS_FIRSTCALL()) { MemoryContext oldcontext; ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("1"))); /* create a function context for cross-call persistence */ funcctx = SRF_FIRSTCALL_INIT(); /* switch to memory context appropriate for multiple function calls */ oldcontext = MemoryContextSwitchTo(funcctx-> multi_call_memory_ctx); /* total number of tuples to be returned */ funcctx->max_calls = PG_GETARG_UINT32(0); /* Build a tuple descriptor for our result type */ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg ("function returning record called in context " "that cannot accept type record"))); /* generate attribute metadata needed later to produce tuples from raw C strings */ attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; MemoryContextSwitchTo(oldcontext); } ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("2"))); /* stuff done on every call of the function */ funcctx = SRF_PERCALL_SETUP(); call_cntr = funcctx->call_cntr; max_calls = funcctx->max_calls; attinmeta = funcctx->attinmeta; if (call_cntr < max_calls) { /* do when there is more left to send */ char **values; HeapTuple tuple; Datum result; ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("3"))); /* Prepare a values array for building the returned tuple. This should be an array of C strings which will be processed later by the type input functions. */ values = (char **) palloc(3 * sizeof(char *)); values[0] = (char *) palloc(16 * sizeof(char)); values[1] = (char *) palloc(16 * sizeof(char)); values[2] = (char *) palloc(16 * sizeof(char)); snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1)); snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1)); snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1)); /* build a tuple */ tuple = BuildTupleFromCStrings(attinmeta, values); /* make the tuple into a datum */ result = HeapTupleGetDatum(tuple); /* clean up (this is not really necessary) */ pfree(values[0]); pfree(values[1]); pfree(values[2]); pfree(values); ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("4"))); SRF_RETURN_NEXT(funcctx, result); } else { /* do when there is no more left */ ereport(INFO, (errcode(ERRCODE_IO_ERROR), errmsg("5"))); SRF_RETURN_DONE(funcctx); } }
Re: [HACKERS] PGparam proposal
Andrew Chernow <[EMAIL PROTECTED]> writes: > For starters, if binary results is a feature you wish you could uninvent > then we are probably dead in the water for that reason. This goes to > the core of our concept. Not really: AFAICS you could implement exactly the API you have sketched without any reliance on binary data transmission whatsoever. As long as PGparam is an opaque struct, library users would have no idea whether the values they provide are being sent as text or binary. You should probably take two steps back and think about what aspects of what you want to do are really involved with providing an easier-to-use API for PQexecParams and friends, and what parts are actually interested in binary data transmission (and why). Separating those issues in your mind might produce more clarity. > In the end, some of these changes would change the text format right? I'd consider that fairly unlikely. For instance, the money width change didn't impact the text format (except to the extent that longer values are now legal), and remembering a timestamptz's zone wouldn't impact the text representation either. Another example is that any significant re-implementation of type NUMERIC (say, as a bignum integer plus exponent instead of the current BCD-ish format) would probably change its binary representation, but there'd be no need for a text change. The bottom line to me is that binary representations are inherently a lot more fragile and version-dependent than text representations. Our attitude so far has been that client-side code that wants to use binary data transmission is taking all the risk of changes on itself. ("If it breaks, you get to keep both pieces.") It's not clear to me what we gain by making libpq subject to those risks. If we could have libpq insulate client apps from these kinds of changes, that would be one thing; but AFAICS, with these more complex types, a binary format change would usually also dictate a change in what the library exposes to clients. As far as I saw, your proposal completely glossed over the issue of exactly what data structure would be exposed to clients for anything more complex than an integer. I'm afraid that that structure would be subject to change, and then we'd just have two layers of brokenness on our hands instead of only one. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] archive_command failures report confusing exit status
Am Montag, 10. Dezember 2007 schrieb Alvaro Herrera: > Peter Eisentraut wrote: > > I figured it would make sense if pgarch.c used the same mechanism that > > postmaster.c uses to report the various variants of regular and signal > > exits. > > Hmm. Getting rid of the "(PID 0)" is going to be a mess enough for > translations that I think it is worth pgarch.c having its own routine > for this. Furthermore I think the detailed archive command should be > reported in an errdetail() field, which makes it even farther off. Better patch. -- Peter Eisentraut http://developer.postgresql.org/~petere/ diff -cr ../cvs-pgsql/src/backend/postmaster/pgarch.c ./src/backend/postmaster/pgarch.c *** ../cvs-pgsql/src/backend/postmaster/pgarch.c 2007-11-26 13:29:36.0 +0100 --- ./src/backend/postmaster/pgarch.c 2007-12-11 17:10:50.0 +0100 *** *** 474,494 rc = system(xlogarchcmd); if (rc != 0) { ! /* ! * If either the shell itself, or a called command, died on a signal, ! * abort the archiver. We do this because system() ignores SIGINT and ! * SIGQUIT while waiting; so a signal is very likely something that ! * should have interrupted us too. If we overreact it's no big deal, ! * the postmaster will just start the archiver again. ! * ! * Per the Single Unix Spec, shells report exit status > 128 when a ! * called command died on a signal. ! */ ! bool signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128; ! ! ereport(signaled ? FATAL : LOG, ! (errmsg("archive command \"%s\" failed: return code %d", ! xlogarchcmd, rc))); return false; } --- 474,514 rc = system(xlogarchcmd); if (rc != 0) { ! if (WIFEXITED(rc)) ! { ! ereport(LOG, ! (errmsg("archive command failed with exit code %d", WEXITSTATUS(rc)), ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } ! else if (WIFSIGNALED(rc)) ! { ! /* ! * If either the shell itself, or a called command, died ! * on a signal, abort the archiver. We do this because ! * system() ignores SIGINT and SIGQUIT while waiting; so a ! * signal is very likely something that should have ! * interrupted us too. If we overreact it's no big deal, ! * the postmaster will just start the archiver again. ! */ ! ereport(FATAL, ( ! #if defined(WIN32) ! errmsg("archive command was terminated by exception 0x%X", WTERMSIG(rc)), ! errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."), ! #elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST ! errmsg("archive command was terminated by signal %d: %s", ! WTERMSIG(rc), ! WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"), ! #else ! errmsg("archive command was terminated by signal %d", WTERMSIG(exitstatus)), ! #endif ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } ! else ! { ! ereport(LOG, ! (errmsg("archive command exited with unrecognized status %d", rc), ! errdetail("The archive command was \"%s\".", xlogarchcmd))); ! } return false; } ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I still propose that the log line number should be moved w.r.t. session > identifier. No objection here. > I changed two more things: the VXID is not reported if not in a backend > (because AuxiliaryProcesses are said to never have one), and added > quotes surrounding the hostname, because in a test here it seems > supported to create an alias for my loopback interface with a name like > "a,b". Sounds reasonable, as long as autovac processes still report VXID. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote: > On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote: > > Features > > - Read Only Tables > > - Compressed Tablespaces > > I wonder if instead of read-only tables wouldn't it be better to have > some kind of automatic partitioning That's definitely on my list of requirements for partitioning. > which permits to have different > chunks of the table data in different tablespaces, and a freeze command > which effectively moves the data from the (normally small) active chunk > to the archive chunk when it's transaction id is older than a predefined > threshold ? As Hannu says, this is exactly what the other features will allow, so Yes! -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 10:19 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > This command will place a ShareLock (only) on the table, preventing > > anybody from writing to the table while we freeze it. The ShareLock is > > incompatible with any transaction that has written to the table, so when > > we acquire the lock all writers to the table will have completed. We > > then run the equivalent of a VACUUM FREEZE which will then be able to > > freeze *all* rows in one pass (rather than all except the most > > recent). > > This breaks MVCC. The fact that a transaction has completed is not > license to discard tuple xmin immediately. Yeh, agreed. I knew I'd solved that bit, so I was focused elsewhere. Sloppy, so apologies. I was originally planning to put a wait in at the beginning, as is used by CREATE INDEX CONCURRENTLY, though I prefer Greg's variant because it's more forgiving. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PGparam proposal
For starters, if binary results is a feature you wish you could uninvent then we are probably dead in the water for that reason. This goes to the core of our concept. If there is no desire to synch client & server in regards to type handling, than this a waste of time. I think all of this would make libpq more powerful. With that said, my follow up: > I think a printf-style API is fundamentally a bad idea in this > context. printf only works well when the set of concepts (datatypes, > format specifiers, etc) is small and fixed; As of now, including numbers and alpha chars, there are 62 classes of which we used 11 (a class is the first char). That leaves 51 classes with 62 types per class. Where I would agree with you, is that over time things can become cryptic. There are only several ways to do this. 1. Use Oids - doesn't work because they are not publically exposed by libpq. 2. Create a function for each type. This idea was rejected because it bloated the API (I'll buy this). 3. Type aliasing schema - naturally, printf-style comes to mind but may become cryptic. I should mention that we were only trying to support built-in pgtypes in libpq. If you are looking to support all external types, then we propose the below: For #3, maybe it would be better to abandon %c or %cc type encoding and move into something more verbose. We could just spell out the type: %int4, %point, etc... Maybe for built-in types you could prepend 'pg': %pgint4, %pgpoint. This opens up the namespace and removes scalability and cryptic issues. Expanding on %pgint4 idea, 3rd party types can supply their own %typname handlers (a more moduler approach). You can install them at runtime, PQinstallTypeHandler(typname, etc..), or something like that. When a 3rd party %typname is encountered, the appropriate handler would be used. Standard pgtypes would be installed by default. PQinstallTypeHandler(... "gisbox2d" ...); PQputf(... "%gisbox2d %pgpolygon %pgint4" ...); //PQgetf would use the same %typname The only thing libpq should support by default, is the built-in pgtypes. A handler can expand on this. > I find the idea of embedding state like that into the PGconn to be > pretty horrid, as well. It makes the design non-reentrant > > You can't just randomly change the behavior of existing API functions. Okay. We initially had the PGparam as a public opaque, but changed it. We will stop piggy backing off the existing parameterized functions. Instead, we will supply a PGparam exec/send functions. typedef struct pg_param PGparam;//opaque param = PQparamCreate(conn); PQputf(param, "%pgint4 %pgtimestamptz", 62, &tstz); res = PQparamExec(conn, param, "command", resfmt); //PQparamExec will always PQparamClear(param), whether it failed or not //That means after an exec/send, the param object is ready for puts PQparamFinish(param); // free it // This causes the below sequence of function calls: // PQparamCreate, PQputf, PQexecParams("... VALUES ($1)"), PQparamFinish res = PQparamExecf(conn, resfmt, "INSERT INTO t VALUES (%pgint)", 62); > * the 8.2-to-8.3 change in the width of type money We have code comments throughout the patches, as well as documented in the release notes. At this point, we solved getf by checking PGgetlength. If its 4, read4 otherwise read8. For putf, we would have to check the server version. > * the likely future change to type timestamptz to store original > timezone explicitly We would have to change how timestamptz handles the binary format from that version forward, looks like a switch on sversion for back+forwards compatibility. > * the likely future change to type text to store encoding/collation > info explicitly Would the server do text conversion and then pass the converted text back to the client? Or, would it choose a common encoding like UTF-8 and return the text with encoding and let the client convert. How does this affect text format? In the end, some of these changes would change the text format right? That would push these changes into the API users lap, to parse and fuss with. I just think it is cleaner to synch the binary and/or text formats with the server. If you are looking for ways to change the binary/text format of types w/o having to make the most recent clients aware of this, then I think we have lost this battle. Another solution is revamping utils/adt so that it is a shared API for client & server. If you upgrade a client, you would automatically get the latest formatting functions. Just like libpq checks protocol version in several places, conn->sversion would have to be checked, or maybe have a typefmt_api_version. andrew & merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Another change I did was to change a "%.*s" to "%*s". The "precision" > > marker seems useless AFAICT. > > This is wrong, broken, will cause crashes on platforms where the PS > string is not null-terminated. (Hint: .* is a maximum width, * is a > minimum width.) Oh, OK. > Another thing I thought strange was the order of the added columns; > why isn't it more like the order in which they appear in the text > version? In particular hint should probably come just after detail > and before context, and internal-query should also come before context > because when relevant it's usually more closely nested than the context > stack. Ok, I changed it like you suggest. I didn't do any other order changes. I still propose that the log line number should be moved w.r.t. session identifier. I changed two more things: the VXID is not reported if not in a backend (because AuxiliaryProcesses are said to never have one), and added quotes surrounding the hostname, because in a test here it seems supported to create an alias for my loopback interface with a name like "a,b". -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "All rings of power are equal, But some rings of power are more equal than others." (George Orwell's The Lord of the Rings) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] WORM and Read Only Tables (v0.1)
Simon Riggs <[EMAIL PROTECTED]> writes: > This command will place a ShareLock (only) on the table, preventing > anybody from writing to the table while we freeze it. The ShareLock is > incompatible with any transaction that has written to the table, so when > we acquire the lock all writers to the table will have completed. We > then run the equivalent of a VACUUM FREEZE which will then be able to > freeze *all* rows in one pass (rather than all except the most > recent). This breaks MVCC. The fact that a transaction has completed is not license to discard tuple xmin immediately. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 11:49 +, Gregory Stark wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > > So... VACUUM FREEZE table SET READ ONLY; > > > > would be my first thought, but I'm guessing everybody will press me > > towards supporting the more obvious > > > > ALTER TABLE table SET READ ONLY; > > > > This command will place a ShareLock (only) on the table, preventing > > anybody from writing to the table while we freeze it. The ShareLock is > > incompatible with any transaction that has written to the table, so when > > we acquire the lock all writers to the table will have completed. We > > then run the equivalent of a VACUUM FREEZE which will then be able to > > freeze *all* rows in one pass (rather than all except the most recent). > > On completion of the freeze pass we will then update the pg_class entry > > to show that it is now read-only, so we will emulate the way VACUUM does > > this. > > To be clear it if it meets a block for which a tuple is not freezable -- that > is, it has an xmin or xmax more recent than the global xmin then it needs to > block waiting for the backend which that recent xmin. Then presumably it needs > to update its concept of recent global xmin going forward. > > You might be best off grabbing a list of txid->xmin when you start and sorting > them by xmin so you can loop through them sleeping until you reach the first > txid with an xmin large enough to continue. D'oh. Completely agreed. Mia culpa. I had that bit in my original design, but I was looking elsewhere on this clearly. I'd been trying to think about how to do this since about 2 years ago and it was only the CREATE INDEX CONCURRENTLY stuff that showed me how. Thanks for nudging me. > > Reversing the process is simpler, since we only have to turn off the > > flag in pg_class: > > I'm not sure how this interacts with: > > > Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only > > tables will be ignored, since they are effectively already there. So we > > don't need to change the internals of the locking, nor edit the RI code > > to remove the call to SHARE lock referenced tables. Do this during > > post-parse analysis. > > Since queries which think they hold FOR SHARE tuple locks will be magically > losing their share locks if you turn off the read-only flag. Do you need to > obtain an exclusive lock on the table to turn it read-write? Agreed. I wasn't suggesting implementing without, just noting that it might have been possible, but it seems not as you say. I don't think its important to be able to do that with less than AccessExclusiveLock. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Document how to turn off disk write cache on popular operating
On Tue, Dec 11, 2007 at 10:09:51AM -0500, Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: > > If you set it to fsync or fsync_writethrough it will write through the > > cache. > > Really? How much should we trust that? I'd say as much as we shuold trust that checkbox in the Windows settings page... I haven't come across a case yet where it doesn't work, but that doesn't mean there isn't one. It certainly writes through the writeback cache of shiny expensive raid controllers :-) And it does write through the IDE drives that I've tested. Bottom line is, I think that the cases where fsync_writethrough doesn't do it, that checkbox isn't going to work either //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Document how to turn off disk write cache on popular operating
Magnus Hagander <[EMAIL PROTECTED]> writes: > If you set it to fsync or fsync_writethrough it will write through the > cache. Really? How much should we trust that? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] VACUUM ANALYZE out of memory
Martijn van Oosterhout wrote: IIRC you said you're on a 32-bit architecture? Which means any single process only has 4GB address space. Take off 1GB for the kernel, 1GB shared memory, 1 GB maintainence workmem and a collection of libraries, stack space and general memory fragmentation and I can absolutly beleive you've run into the limit of *address* space. Should have been 64-bit, but a foul-up means it is running in 32-bit at the moment. On a 64-bit machine it doesn't matter so much but on a 32-bit machine using 1GB for shared memory severely cuts the amount of auxilliary memory the server can use. Unless you've shown a measuable difference between 256MB and 1G shared memory, I'd say you're better off using the smaller amount so you can have higher maintainence work mem. We're still in the process of testing and tuning (which takes its sweet time), so at the moment I can not tell what benefits we have on the different settings in practice. But I'll try to set shared buffers down to 128-256 MB and the maintenance_work_memory to 512-1024MB when I next have a time slot where I can run the server into the ground. However, the problem also occurred with the shared_buffers limit set at 24 MB and maintenance_work_mem was at its default setting (16 MB?), so I would be rather surprised if the problem did not repeat itself. Regards, Michael Akinde Database Architect, met.no begin:vcard fn:Michael Akinde n:Akinde;Michael org:Meteorologisk Institutt, Norge;IT adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge email;internet:[EMAIL PROTECTED] tel;work:22963379 tel;cell:45885379 x-mozilla-html:FALSE url:http://www.met.no version:2.1 end:vcard ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs
Alvaro Herrera wrote: > > Another problem I just noticed is that it seems the bgwriter is > inheriting the session id from Postmaster; it doesn't have one of its > own. Huh, sorry, I'm just blind, I neglected to look at the last digit ;-) -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs
Another problem I just noticed is that it seems the bgwriter is inheriting the session id from Postmaster; it doesn't have one of its own. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "People get annoyed when you try to debug them." (Larry Wall) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Document how to turn off disk write cache on popular operating
On Tue, Dec 11, 2007 at 08:18:42AM -0500, Bruce Momjian wrote: > Magnus Hagander wrote: > > On Mon, Dec 10, 2007 at 02:05:05PM +, Bruce Momjian wrote: > > > Log Message: > > > --- > > > Document how to turn off disk write cache on popular operating systems. > > > > > > Modified Files: > > > -- > > > pgsql/doc/src/sgml: > > > wal.sgml (r1.46 -> r1.47) > > > > > > (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/wal.sgml?r1=1.46&r2=1.47) > > > > Should this mention that you don't need to turn it off at the disk level if > > you use fsync_writethrough? > > Uh, I remember we looked at this checkbox before but I don't remember > the details, and I can't find a comment about it. Was the issue that > writethrough always forces through the disk cache? Is that the default > on Win32? Did we comment this somewhere? If you set it to fsync or fsync_writethrough it will write through the cache. (fsync is just an alias) If you set it to OPEN_DATASYNC, it will respond to the checkbox you are referring to. OPEN_DATASYNC is the default, IIRC. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Problem with ControlFileData structure being ABI depe ndent
On Mon, Dec 10, 2007 at 09:56:39AM +, Dave Page wrote: > Dave Page wrote: > > Tom Lane wrote: > >> Dave Page <[EMAIL PROTECTED]> writes: > >>> Gregory Stark wrote: > An alternative is leaving it in the project file but putting > something like > this in c.h: > >> > >> Put it in win32.h, please. c.h shouldn't get cluttered with > >> platform-specific kluges when there's no need for it. > >> > >> Is there a good reason not to just #define _USE_32BIT_TIME_T in win32.h? > > > > Yeah, the fact that addons may then end up partially compiled with and > > partially without it being defined. It we just have it error as Greg > > suggested, then it will force the authors to define it themselves, and > > if they get that wrong it's their fault not ours. > > Patch attached. Applide with two tiny modifications - the missing quote taht you mentioned, and changed the #ifdef to only affect MSVC and not mingw. //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] VACUUM ANALYZE out of memory
On Tue, Dec 11, 2007 at 03:18:54PM +0100, Michael Akinde wrote: > The server has 4 GB RAM available, so even if it was trying to use 1.2 > GB shared memory + 1 GB for maintenance_mem all at once, it still seems > odd that the process would fail. As far as I can tell (running ulimit -a > ), the limits look pretty OK to me. IIRC you said you're on a 32-bit architecture? Which means any single process only has 4GB address space. Take off 1GB for the kernel, 1GB shared memory, 1 GB maintainence workmem and a collection of libraries, stack space and general memory fragmentation and I can absolutly beleive you've run into the limit of *address* space. On a 64-bit machine it doesn't matter so much but on a 32-bit machine using 1GB for shared memory severely cuts the amount of auxilliary memory the server can use. Unless you've shown a measuable difference between 256MB and 1G shared memory, I'd say you're better off using the smaller amount so you can have higher maintainence work mem. VACUUM doesn't benefit much from lots of shared buffers, but it does benefit from maint workmem. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution > inevitable. > -- John F Kennedy signature.asc Description: Digital signature
Re: [HACKERS] There's random access and then there's random access
"Gregory Stark" <[EMAIL PROTECTED]> writes: > I think this will be easiest to do for bitmap index scans. Since we gather up > all the pages we'll need before starting the heap scan we can easily skim > through them, issue posix_fadvises for at least a certain number ahead of the > actual read point and then proceed with the rest of the scan unchanged. I've written up a simple test implementation of prefetching using posix_fadvise(). Here are some nice results on a query accessing 1,000 records from a 10G table with 300 million records: postgres=# set preread_pages=0; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,3) as x)x; SET QUERY PLAN Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=6069.505..6069.509 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 rows=1 loops=1) SubPlan -> Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=6069.425..6069.426 rows=1 loops=1) -> Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3543.107..6068.335 rows=1000 loops=1) Recheck Cond: (h = ANY ($0)) -> Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3542.220..3542.220 rows=1000 loops=1) Index Cond: (h = ANY ($0)) Total runtime: 6069.632 ms (9 rows) postgres=# set preread_pages=300; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,3) as x)x; SET QUERY PLAN Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=3945.602..3945.607 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.060..0.064 rows=1 loops=1) SubPlan -> Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=3945.520..3945.521 rows=1 loops=1) -> Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3505.546..3944.817 rows=1000 loops=1) Recheck Cond: (h = ANY ($0)) -> Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3452.759..3452.759 rows=1000 loops=1) Index Cond: (h = ANY ($0)) Total runtime: 3945.730 ms (9 rows) Note that while the query itself is only 50% faster the bitmap heap scan specifically is actually 575% faster than without readahead. It would be nice to optimize the bitmap index scan as well but that will be a bit trickier and it probably won't be able to cover as many pages. As a result it probably won't be a 5x speedup like the heap scan. Also, this is with a fairly aggressive readahead which only makes sense for queries that look a lot like this and will read all the tuples. For a more general solution I think it would make sense to water down the performance a bit in exchange for some protection against doing unnecessary I/O in cases where the query isn't actually going to read all the tuples. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] VACUUM ANALYZE out of memory
Stefan Kaltenbrunner wrote: Michael Akinde wrote: Incidentally, in the first error of the two I posted, the shared memory setting was significantly lower (24 MB, I believe). I'll try with 128 MB before I leave in the evening, though (assuming the other tests I'm running complete by then). this is most likely not at all related to your shared memory settings but to your setting of maintenance_work_mem which is the amount of memory a single backend(!) can use for maintainance operations (which VACUUM is for example). notice that your first error refers to an allocation of about 500MB which your ulimit/kernel process limit simply might not be able to give a single process. Yes - in the first case, the maintenance_work_mem was at default (so I wasn't surprised to see it fail to allocate half a gigabyte). In the second case, though, maintenance_work_mem was set at 1024 MB (where it then has the slighly odd error "Failed on request of size 32"). The server has 4 GB RAM available, so even if it was trying to use 1.2 GB shared memory + 1 GB for maintenance_mem all at once, it still seems odd that the process would fail. As far as I can tell (running ulimit -a ), the limits look pretty OK to me. core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited max nice(-e) 0 file size (blocks, -f) unlimited pending signals (-i) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited max rt priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Being unable to run VACUUM FULL isn't a problem for the current configuration of our application (as it will mostly be large amounts of static data), but we're likely to have an application working with the database next year where we'd move around 100 GB through the database on a daily basis. At least based on the documentation of the various commands, I would expect that one would want to perform VACUUM FULL every once in a while. Again, thanks for the feedback. Regards, Michael Akinde Database Architect, met.no begin:vcard fn:Michael Akinde n:Akinde;Michael org:Meteorologisk Institutt, Norge;IT adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge email;internet:[EMAIL PROTECTED] tel;work:22963379 tel;cell:45885379 x-mozilla-html:FALSE url:http://www.met.no version:2.1 end:vcard ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On 11/12/2007, Simon Riggs <[EMAIL PROTECTED]> wrote: > > Attach > -- > > Writing tables on one system and then moving that data to other systems > is fairly common. If we supported read-only tables then you might > consider how you would publish new versions to people. > > For now, we just want to consider how we will upgrade from one release > to another without needing to unload and reload potentially many > Terabytes of data. We can't delete the old data until the new data is > successfully loaded, so we will have a huge temporary storage cost. This > could go very slowly if we use cheaper storage, plus reloading the data > means we have to re-freeze it again also. > > So we need a way of attaching the old tables to the new database. We > might call this binary upgrade, or we might be slightly less ambitious > and talk about just moving the old read-only data. That's all I want to > do at this stage. > > I'm mentioning this here now to see what comes out in debate, and what > others are planning to work on in this area. > > This sounds like allowing new versions to read old versions file structure, Probably I guess on a tablespace by table space basis Another advantage might be to be able to load an old pitr backup in a new version across major versions. But I'm not sure that would work. Maybe we need a command to upgrade a tablespace to a new versions file format? Need to be careful we don't stunt future progress by fixing file format much. Peter Childs
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On 11/12/2007, Csaba Nagy <[EMAIL PROTECTED]> wrote: > > On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote: > > Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy: > > >> Then put the active chunk on a high performance file system and the > > > archive tablespace on a compressed/slow/cheap file system and you're > > > done. Allow even the archive chunk to be updateable, and put new tuple > > > data in the active chunk. It would work just fine for cases where the > > > old data is rarely updated/deleted... > > > > You can't update a table on a read-only (write-once) partition, at least > > not with current header structure. > > OK, but that's what I'm challenging, why do you need a write once > partition ? You mean by that tapes ? OK, it means I was thinking in > completely different usage scenarios then... > > Cheers, > Csaba. > > > I think DVD or CD would make sence, Tapes have an added limitation of being sequential access only. Peter Childs
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On 12/11/07, Simon Riggs <[EMAIL PROTECTED]> wrote: > Compressed Tablespaces > Using a streaming library like zlib, it will be easy to read/write data > files into a still-usable form but with much reduced size. Access to a > compressed table only makes sense as a SeqScan. That would be handled by > introducing tablespace-specific access costs, discussed below. Indexes > on compressed tables would still be allowed, but would hardly ever be > used. I've actually been wanting this lately, for a couple reasons. One is reduced disk footprint, but the other is reduced I/O, similar to how TOAST helps with large fields now. (In my particular scenario, TOAST can't help due to small field sizes.) It would be useful to have available even on read/write data. To that end, it would probably make more sense to use a block compression algorithm rather than a streaming one. Block-based algorithms can generally get better compression than streaming ones as well, at least when fed large enough blocks. I'm not familiar with the implementation issues, other than the obvious "variable block sizes make the I/O subsystem look very different", so I don't know if there's a major tradeoff between the two strategies (even just for read-only). > I'm open to arguments that we don't need this at all because filesystem > utilities exist that do everything we need. You're experience will be > good to hear about in regard to this feature. Some filesystems do support transparent compression, but they're not always available. It would be nice to have compression on unsophisticated systems with cheap hardware. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [BUGS] BUG #3799: csvlog skips some logs
Hmm, I'm now wondering if the log line number is correctly positioned. Right now we have it just after the PID. So it suggests that following PID and log line number is enough for tracking what a session does. While this is not entirely incorrect, ISTM to be more logical to put it closer to the session ID, and change the name so it is less misleading in that sense. Currently we have session_id | 475e91da.291f connection_from| [local] process_id | 10527 process_line_num | 3 I propose we change it to process_id | 10527 connection_from| [local] session_id | 475e91da.291f session_line_num | 3 Note changed column name. I also suggest we change the description of %l in log_line_prefix to % Number of the log line for each session, starting at 1 original is Number of the log line for each process, starting at 1 Thoughts? -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "No necesitamos banderas No reconocemos fronteras" (Jorge González) ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Re: Document how to turn off disk write cache on popular operating
Magnus Hagander wrote: > On Mon, Dec 10, 2007 at 02:05:05PM +, Bruce Momjian wrote: > > Log Message: > > --- > > Document how to turn off disk write cache on popular operating systems. > > > > Modified Files: > > -- > > pgsql/doc/src/sgml: > > wal.sgml (r1.46 -> r1.47) > > > > (http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/wal.sgml?r1=1.46&r2=1.47) > > Should this mention that you don't need to turn it off at the disk level if > you use fsync_writethrough? Uh, I remember we looked at this checkbox before but I don't remember the details, and I can't find a comment about it. Was the issue that writethrough always forces through the disk cache? Is that the default on Win32? Did we comment this somewhere? -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] quotas once again
Gevik Babakhani wrote: After reading the thread of 2004 regarding user quotas, I understand why the discussion moved towards having a tablespace quota as a solution. My reason to start this discussion was due the need of controlling database size. Having tablespace quotas could allow one to create a database in a given tablespace and then limit the size of the tablespace. You can control tablespace size by filesystem quotas. When you put each tablespace on separate FS. Hovewer, disadvantage is that you need admin access to the machine and it is not controlled by postgres. ZFS works fine in this case, because you can delegate volume/filesystem administration to the any user and FS creation is very easy. Zdenek ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote: > Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy: > >> Then put the active chunk on a high performance file system and the > > archive tablespace on a compressed/slow/cheap file system and you're > > done. Allow even the archive chunk to be updateable, and put new tuple > > data in the active chunk. It would work just fine for cases where the > > old data is rarely updated/deleted... > > You can't update a table on a read-only (write-once) partition, at least > not with current header structure. OK, but that's what I'm challenging, why do you need a write once partition ? You mean by that tapes ? OK, it means I was thinking in completely different usage scenarios then... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] WORM and Read Only Tables (v0.1)
Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy: > On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote: > > Features > > - Read Only Tables > > - Compressed Tablespaces > > I wonder if instead of read-only tables wouldn't it be better to have > some kind of automatic partitioning which permits to have different > chunks of the table data in different tablespaces, and a freeze command > which effectively moves the data from the (normally small) active chunk > to the archive chunk when it's transaction id is older than a predefined > threshold ? This would be doable using Simons proposed commands. > Then put the active chunk on a high performance file system and the > archive tablespace on a compressed/slow/cheap file system and you're > done. Allow even the archive chunk to be updateable, and put new tuple > data in the active chunk. It would work just fine for cases where the > old data is rarely updated/deleted... You can't update a table on a read-only (write-once) partition, at least not with current header structure. > Another advantage I guess would be that active data would more likely > stay in cache, as updated records would stay together and not spread > over the inactive. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote: > Another advantage I guess would be that active data would more likely > stay in cache, as updated records would stay together and not spread > over the inactive. And I forgot to mention that vacuum could mostly skip the archive part, and only vacuum the active part, which would drastically reduce the cost of vacuuming big & active tables. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote: > Features > - Read Only Tables > - Compressed Tablespaces I wonder if instead of read-only tables wouldn't it be better to have some kind of automatic partitioning which permits to have different chunks of the table data in different tablespaces, and a freeze command which effectively moves the data from the (normally small) active chunk to the archive chunk when it's transaction id is older than a predefined threshold ? Then put the active chunk on a high performance file system and the archive tablespace on a compressed/slow/cheap file system and you're done. Allow even the archive chunk to be updateable, and put new tuple data in the active chunk. It would work just fine for cases where the old data is rarely updated/deleted... Another advantage I guess would be that active data would more likely stay in cache, as updated records would stay together and not spread over the inactive. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] partitioned table query question
[moved from -general] "Tom Lane" <[EMAIL PROTECTED]> writes: > The bottom line here is that we have built a partitioning facility > out of spare parts, ie, a very generalized contradiction-proving > section of the planner. That's true but I think the conclusions you reach are a stretch. The main problem with our existing approach is that there's no way to make the kind of deductions we want which depend on an holistic view of all the constraints. So you can't sort the ranges and search them using a binary scan, or join matching partitions from separate tables before appending them, or any number of similar ideas. But I don't see that handling hash or bin partitioning is really going to be so hard with our current scheme. I think we need to have some form of understanding of which functions preserve which btree ordering information in order to make better use of expression indexes anyways. > Eventually we'll have to push an understanding of partitioning down to > some lower level of the system --- that is, if we think it's critical > enough to justify that much effort. If by a lower level of the system you mean the storage manager or anything like that then I definitely don't agree. If on the other hand you just mean a simpler more regular structure than turing-complete constraints then I would agree, but for the reasons above -- not for the problems with hash/bin partitioning and equality semantics. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] VACUUM ANALYZE out of memory
On Tue, Dec 11, 2007 at 12:30:43PM +0100, Michael Akinde wrote: > The way the process was running, it seems to have basically just > continually allocated memory until (presumably) it broke through the > slightly less than 1.2 GB shared memory allocation we had provided for > PostgreSQL (at least the postgres process was still running by the time > resident size had reached 1.1 GB). I think you're slightly confused. The VACUUM isn't going to use much of the shared memory anyway. Shared memory is just disk buffers mostly and is all allocated at startup. The memory being allocated by VACUUM is the maintainence workmem *in addition* to any shared memory. Also, depending on what's happening it may be allocating maintainence workmem more than once. > Incidentally, in the first error of the two I posted, the shared memory > setting was significantly lower (24 MB, I believe). I'll try with 128 MB > before I leave in the evening, though (assuming the other tests I'm > running complete by then). What you want is a reasonable shared mem, maybe 0.5GB and a smaller maintainence workmem since the letter is probably what's killing you. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution > inevitable. > -- John F Kennedy signature.asc Description: Digital signature
Re: [HACKERS] VACUUM ANALYZE out of memory
Michael Akinde wrote: > Thanks for the rapid responses. > > Stefan Kaltenbrunner wrote: >> this seems simply a problem of setting maintenance_work_mem too high (ie >> higher than what your OS can support - maybe an ulimit/processlimit is in >> effect?) . Try reducing maintenance_work_mem to say 128MB and retry. >> If you promise postgresql that it can get 1GB it will happily try to use >> it ... > I set up the system together with one of our Linux sysOps, so I think the > settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to > recheck if there could be any other limits he has forgotten to increase. You are confusing shared memory (shared_buffers and kernel.shmmax) with local memory (work_mem and maintenance_work_mem). The error you got is about the latter kind. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "La soledad es compañía" ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] WORM and Read Only Tables (v0.1)
"Simon Riggs" <[EMAIL PROTECTED]> writes: > So... VACUUM FREEZE table SET READ ONLY; > > would be my first thought, but I'm guessing everybody will press me > towards supporting the more obvious > > ALTER TABLE table SET READ ONLY; > > This command will place a ShareLock (only) on the table, preventing > anybody from writing to the table while we freeze it. The ShareLock is > incompatible with any transaction that has written to the table, so when > we acquire the lock all writers to the table will have completed. We > then run the equivalent of a VACUUM FREEZE which will then be able to > freeze *all* rows in one pass (rather than all except the most recent). > On completion of the freeze pass we will then update the pg_class entry > to show that it is now read-only, so we will emulate the way VACUUM does > this. To be clear it if it meets a block for which a tuple is not freezable -- that is, it has an xmin or xmax more recent than the global xmin then it needs to block waiting for the backend which that recent xmin. Then presumably it needs to update its concept of recent global xmin going forward. You might be best off grabbing a list of txid->xmin when you start and sorting them by xmin so you can loop through them sleeping until you reach the first txid with an xmin large enough to continue. > Reversing the process is simpler, since we only have to turn off the > flag in pg_class: I'm not sure how this interacts with: > Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only > tables will be ignored, since they are effectively already there. So we > don't need to change the internals of the locking, nor edit the RI code > to remove the call to SHARE lock referenced tables. Do this during > post-parse analysis. Since queries which think they hold FOR SHARE tuple locks will be magically losing their share locks if you turn off the read-only flag. Do you need to obtain an exclusive lock on the table to turn it read-write? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] VACUUM ANALYZE out of memory
Michael Akinde wrote: Thanks for the rapid responses. Stefan Kaltenbrunner wrote: this seems simply a problem of setting maintenance_work_mem too high (ie higher than what your OS can support - maybe an ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem to say 128MB and retry. If you promise postgresql that it can get 1GB it will happily try to use it ... I set up the system together with one of our Linux sysOps, so I think the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to recheck if there could be any other limits he has forgotten to increase. The way the process was running, it seems to have basically just continually allocated memory until (presumably) it broke through the slightly less than 1.2 GB shared memory allocation we had provided for PostgreSQL (at least the postgres process was still running by the time resident size had reached 1.1 GB). Incidentally, in the first error of the two I posted, the shared memory setting was significantly lower (24 MB, I believe). I'll try with 128 MB before I leave in the evening, though (assuming the other tests I'm running complete by then). this is most likely not at all related to your shared memory settings but to your setting of maintenance_work_mem which is the amount of memory a single backend(!) can use for maintainance operations (which VACUUM is for example). notice that your first error refers to an allocation of about 500MB which your ulimit/kernel process limit simply might not be able to give a single process. And for very large tables VACUUM FULL is generally not a good idea at all - either look into regular normal vacuum scheduling or if you need to recover from a a bloated database use a command that forced a rewrite of the table (like CLUSTER) which will be heaps faster but also require about twice the amount of diskspace. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] VACUUM ANALYZE out of memory
Thanks for the rapid responses. Stefan Kaltenbrunner wrote: this seems simply a problem of setting maintenance_work_mem too high (ie higher than what your OS can support - maybe an ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem to say 128MB and retry. If you promise postgresql that it can get 1GB it will happily try to use it ... I set up the system together with one of our Linux sysOps, so I think the settings should be OK. Kernel.shmmax is set to 1.2 GB, but I'll get him to recheck if there could be any other limits he has forgotten to increase. The way the process was running, it seems to have basically just continually allocated memory until (presumably) it broke through the slightly less than 1.2 GB shared memory allocation we had provided for PostgreSQL (at least the postgres process was still running by the time resident size had reached 1.1 GB). Incidentally, in the first error of the two I posted, the shared memory setting was significantly lower (24 MB, I believe). I'll try with 128 MB before I leave in the evening, though (assuming the other tests I'm running complete by then). Simon Riggs wrote: On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote: I am encountering problems when trying to run VACUUM FULL ANALYZE on a particular table in my database; namely that the process crashes out with the following problem: Probably just as well, since a VACUUM FULL on an 800GB table is going to take a rather long time, so you are saved from discovering just how excessively long it will run for. But it seems like a bug. This happens consistently, I take it? I suspect so, though it has only happened a couple of times yet (as it does take a while) before it hits that 1.1 GB roof. But part of the reason for running the VACUUM FULL was of course to find out how long time it would take. Reliability is always a priority for us, so I like to know what (useful) tools we have available and stress the system as much as possible... :-) Can you run ANALYZE and then VACUUM VERBOSE, both on just pg_largeobject, please? It will be useful to know whether they succeed. I ran just ANALYZE on the entire database yesterday, and that worked without any problems. I am currently running a VACUUM VERBOSE on the database. It isn't done yet, but it is running with a steady (low) resource usage. Regards, Michael A. begin:vcard fn:Michael Akinde n:Akinde;Michael org:Meteorologisk Institutt, Norge;IT adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge email;internet:[EMAIL PROTECTED] tel;work:22963379 tel;cell:45885379 x-mozilla-html:FALSE url:http://www.met.no version:2.1 end:vcard ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] WORM and Read Only Tables (v0.1)
Many applications have the need to archive data after it has been through the initial flurry of reads and updates that follows its original insertion. Currently there is no specific feature support to meet this requirement, so I propose to add this for 8.4. Use Case: VLDB with tons of (now) read only data, some not. Data needs to be accessible, but data itself is rarely touched, allowing storage costs to be minimised via a "storage hierarchy" of progressively cheaper storage. Features - Read Only Tables - Compressed Tablespaces - Attaching table - Per-Tablespace Access Costing - Performance Tuning Read-Only Tables Postgres supports the concept of freezing tuples, so they can live forever within the database without needing further writes. Currently there is no command that will guarantee that a table has been completely frozen. This makes it difficult to reliably write data files to WORM media for longer term archiving. (WORM means Write-Once, Read-Many). It's also a pain having to VACUUM a large table again just because a small number of rows need to be frozen. So we need a DDL command that will ensure all tuples are frozen and then mark the table as read-only. Ideally, we would like to do this in a way that doesn't hold long full table locks, since we want the data to remain accessible at all times. So... VACUUM FREEZE table SET READ ONLY; would be my first thought, but I'm guessing everybody will press me towards supporting the more obvious ALTER TABLE table SET READ ONLY; This command will place a ShareLock (only) on the table, preventing anybody from writing to the table while we freeze it. The ShareLock is incompatible with any transaction that has written to the table, so when we acquire the lock all writers to the table will have completed. We then run the equivalent of a VACUUM FREEZE which will then be able to freeze *all* rows in one pass (rather than all except the most recent). On completion of the freeze pass we will then update the pg_class entry to show that it is now read-only, so we will emulate the way VACUUM does this. This form of the ALTER TABLE command will need to be mangled so it can only run outside of a transaction block and also so it takes only a ShareLock rather than an AccessExclusiveLock. Reversing the process is simpler, since we only have to turn off the flag in pg_class: ALTER TABLE table SET READ WRITE; possibly able to do this without grabbing an AccessExclusiveLock, though that isn't an important part of this implementation. Read-only tables never need VACUUMing, so we would be able to make autovacuum and explicit vacuum ignore them. Read-only tables may not be written to, yet would still allow implicit or explicit INSERT, UPDATE and DELETE privileges to be held on the table. Attempts to write to the table will result in a specific "read only table cannot be modified" ERROR. This allows a table to be placed into read-only mode for long periods of time and flipped back to read-write if some updates are required. That is useful for various performance reasons, see later. We can't use the privilege mechanism to prevent writes since superusers bypass them. (Thoughts?) Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only tables will be ignored, since they are effectively already there. So we don't need to change the internals of the locking, nor edit the RI code to remove the call to SHARE lock referenced tables. Do this during post-parse analysis. Tables can be copied to WORM media by using ALTER TABLE table SET TABLESPACE tblspc; This would also use a ShareLock rather than an AccessExclusiveLock, piggy-backing off the work mentioned above. Running SET TABLESPACE and SET READ ONLY at the same time might sound like a good plan, but ISTM will require two fairly different code paths, so if we do it at all it will be a later addition. Compressed Tablespaces -- Frequently with large data archives there is a requirement to reduce the footprint of the data to allow longer term storage costs to be reduced. For Insert-only data we might imagine we can reduce the size of tables by removing unused tuple header information. Although that is possible, repeated headers compress fairly well, so it seems easier to tackle the problem directly by having compressed tables. Using a streaming library like zlib, it will be easy to read/write data files into a still-usable form but with much reduced size. Access to a compressed table only makes sense as a SeqScan. That would be handled by introducing tablespace-specific access costs, discussed below. Indexes on compressed tables would still be allowed, but would hardly ever be used. Access would probably be via tablespace-specific storage managers. So implement mdcompress.c alongside md.c in src/backend/storage/smgr. If that implementation route was chosen, it would then allow the compression option to be made at tablespace level, so commands would be: CREATE TABL
[HACKERS] VLDB Features
I'm starting work on next projects for 8.4. Many applications have the need to store very large data volumes for both archival and analysis. The analytic databases are commonly known as Data Warehouses, though there isn't a common term for large archival data stores. The use cases for those can often be blurred and many people see those as only one use case. My initial interest is in the large archival data stores. One of the main issues to be faced is simply data maintenance and management. Loading, deleting, vacuuming data all takes time. Those issues relate mainly to the size of the data store rather than any particular workload, so I'm calling that set of required features "Very Large Database" (or VLDB) features. VLDB Features I'm expecting to work on are - Read Only Tables/WORM tables - Advanced Partitioning - Compression plus related performance features Details of those will be covered in separate mails over next few weeks and months. So just to let everybody know that's where I'm headed, so you see the big picture with me. I'll be working on other projects as well, many of which I've listed here: http://developer.postgresql.org/index.php/Simon_Riggs% 27_Development_Projects I expect the list is too long to complete for 8.4, but I'm allowing for various issues arising during development. So specific discussion on other mails as they arrive, please. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] VACUUM ANALYZE out of memory
On Tue, 2007-12-11 at 10:59 +0100, Michael Akinde wrote: > I am encountering problems when trying to run VACUUM FULL ANALYZE on a > particular table in my database; namely that the process crashes out > with the following problem: Probably just as well, since a VACUUM FULL on an 800GB table is going to take a rather long time, so you are saved from discovering just how excessively long it will run for. But it seems like a bug. This happens consistently, I take it? Can you run ANALYZE and then VACUUM VERBOSE, both on just pg_largeobject, please? It will be useful to know whether they succeed. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] VACUUM ANALYZE out of memory
Michael Akinde wrote: Hi, I am encountering problems when trying to run VACUUM FULL ANALYZE on a particular table in my database; namely that the process crashes out with the following problem: INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 536870912. INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 32. Granted, our largeobject table is a bit large: INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 estimated total rows ...but I trust that VACUUM ANALYZE doesn't try to read the entire table into memory at once. :-) The machine was set up with 1.2 GB shared memory and 1 GB maintenance memory, so I would have expected this to be sufficient for the task (we will eventually set this up oa 64-bit machine with 16 GB memory, but at the moment we are restricted to 32 bit). This is currently running on PostgreSQL 8.3beta2, but since I haven't seen this problem reported before, I guess this will also be a problem in earlier versions. Have we run into a bug/limitation of the Postgres VACUUM or is this something we might be able to solve via reconfiguring the server/database, or downgrading the DBMS version. this seems simply a problem of setting maintenance_work_mem too high (ie higher than what your OS can support - maybe an ulimit/processlimit is in effect?) . Try reducing maintenance_work_mem to say 128MB and retry. If you promise postgresql that it can get 1GB it will happily try to use it ... Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] VACUUM ANALYZE out of memory
Hi, I am encountering problems when trying to run VACUUM FULL ANALYZE on a particular table in my database; namely that the process crashes out with the following problem: INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 536870912. INFO: vacuuming "pg_catalog.pg_largeobject" ERROR: out of memory DETAIL: Failed on request of size 32. Granted, our largeobject table is a bit large: INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3000 of 116049431 pages, containing 18883 live rows and 409 dead rows; 3000 rows in sample, 730453802 estimated total rows ...but I trust that VACUUM ANALYZE doesn't try to read the entire table into memory at once. :-) The machine was set up with 1.2 GB shared memory and 1 GB maintenance memory, so I would have expected this to be sufficient for the task (we will eventually set this up oa 64-bit machine with 16 GB memory, but at the moment we are restricted to 32 bit). This is currently running on PostgreSQL 8.3beta2, but since I haven't seen this problem reported before, I guess this will also be a problem in earlier versions. Have we run into a bug/limitation of the Postgres VACUUM or is this something we might be able to solve via reconfiguring the server/database, or downgrading the DBMS version. I shall be trying to run a simple VACUUM later this evening, in order to see whether that manages to complete. Unfortunately, due to the time it takes to load data, it's not really practicable to shift servers at the moment A little background on the application: We are building a raster-database to be used for storing weather and water data. The raster data (2D matrices of floating points) are stored using large objects and indexed using a values table (with multiple dimensions: time, parameter, altitudes, etc). This is a technique I've worked with successfully in the past, though in that case using an Informix DBMS. My current employer is a strong proponent for Open Software, which has led to our implementation of the current system on a PostgreSQL DBMS (we will also be releasing our system as GPL in the near future). The test instance we are working on now is about 1 TB; we expect to increase that by a factor of at least 5 within the first year of operation, so we'd really like to ensure that we can get VACUUM working (although the data is mostly going to be static on this installation, we will have others that won't be). Anyone with some insights on VACUUM FULL ANALYZE who can weigh in on what is going wrong? Regards, Michael Akinde Database Architect, met.no begin:vcard fn:Michael Akinde n:Akinde;Michael org:Meteorologisk Institutt, Norge;IT adr;quoted-printable:;;Gaustadall=C3=A9en 30D;Oslo;;0313;Norge email;internet:[EMAIL PROTECTED] tel;work:22963379 tel;cell:45885379 x-mozilla-html:FALSE url:http://www.met.no version:2.1 end:vcard ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Problem of a server gettext message.
> > GetText is conversion po(EUC_JP) to SJIS. Yes. > Are you sure about that? Why would gettext be converting to SJIS, when > SJIS is nowhere in the environment it can see? gettext is using GetACP () on Windows, wherever that gets it's info from ... "chcp" did change the GetACP codepage in Hiroshi's example, but chcp does not reflect in LC_* Seems we may want to use bind_textdomain_codeset. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] buildenv.pl/buildenv.bat
On Sun, Dec 09, 2007 at 02:40:37PM -0500, Andrew Dunstan wrote: > > > Magnus Hagander wrote: > >> > >>You seem to have misunderstood what I am suggesting. Of course we should > >>document use of buildenv.pl in addition to the hacky fix to the .bat > >>files. The hack is the part that would be invisible. The docs would be > >>visible and contain what would be our ongoing practice. > >> > > > >Correct, I was misunderstanding it :-) > > > >I still can't say I like that hack though. I'd rather document that you > >have to do it in the .bat file for docs + gui build. > > > >But it's better than what I thought you were proposing :-) > > > > > > > > Well, I honestly think we can live with it for one cycle. As soon as 8.4 > opens I'll get to work converting these .bat files to pure one line > wrappers. Ok. We obviously don't agree on what to do here, so let's open it up for somebody else to comment on what they think is best. My take is document the fact that you have to do it twice. Andrews is this patch. > Meanwhile, here's the proposed patch. If you want something else you'll > have to do it. > > I don't know how many people regularly build on Windows other than you, > me and Dave. I know a couple of others. But - it was Dave that got bitten by this thing first, which is when I noticed it and brought it up :-P //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org