Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)
On Fri, Jan 29, 2010 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: I find it doubtful that it's actually necessary in Oracle's version of listagg ... Eh? http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions087.htm Defines: *LISTAGG* (measure_expr [, 'delimiter_expr']) *WITHIN GROUP* (order_by_clause) [*OVER* query_partition_clause] -- Jonah H. Harris
Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)
On Fri, Jan 29, 2010 at 12:09 PM, Jonah H. Harris jonah.har...@gmail.comwrote: On Fri, Jan 29, 2010 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: I find it doubtful that it's actually necessary in Oracle's version of listagg ... Eh? http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions087.htm Defines: *LISTAGG* (measure_expr [, 'delimiter_expr']) *WITHIN GROUP* (order_by_clause) [*OVER* query_partition_clause] SQL Server's listagg is similar to the PG implementation. It seems Oracle thinks people would prefer to order the list and for that reason, made their listagg a rank function type. Having done quite a bit of work generating delimited lists/arrays based on ordering in PG, I generally agree that it's what I would generally want. -- Jonah H. Harris
Re: [HACKERS] can somebody execute this query on Oracle 11.2g and send result?
On Thu, Jan 28, 2010 at 9:10 AM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello, I can't to install Oracle, and need to know result. CREATE TABLE foo(a varchar(10), b varchar(10)); INSERT INTO foo VALUES('aaa',','); INSERT INTO foo VALUES('bbb',';'); INSERT INTO foo VALUES('ccc','+'); SELECT listagg(a,b) FROM foo; That's not how listagg works. The syntax is listagg(expression [, delimiter]) WITHIN GROUP (order by clause) [OVER partition clause] If a delimiter is defined, it must be a constant. Query: SELECT listagg(a, ',') WITHIN GROUP (ORDER BY a) FROM foo; Result: aaa,bbb,ccc Query: SELECT listagg(a, ';') WITHIN GROUP (ORDER BY a) FROM foo; Result: aaa;bbb;ccc Query: SELECT listagg(a, '+') WITHIN GROUP (ORDER BY a) FROM foo; Result: aaa+bbb+ccc -- Jonah H. Harris
Re: [HACKERS] Block-level CRC checks
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.comwrote: On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: Does $COMPETITOR offer this feature? My understanding is that MSSQL does. I am not sure about Oracle. Those are the only two I run into (I don't run into MySQL at all). I know others likely compete in the DB2 space. To my knowledge, MySQL, InnoDB, BerkeleyDB, solidDB, Oracle, SQL Server, Sybase, DB2, eXtremeDB, RDB, and Teradata all checksum pages. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] table corrupted
On Thu, Oct 22, 2009 at 7:16 PM, Joshua D. Drake j...@commandprompt.comwrote: On Thu, 2009-10-22 at 14:28 -0200, João Eugenio Marynowski wrote: Hi Repair? Not likely. Get past? Maybe. I don't know how valuable your data is, but I've performed data recovery on tens of PG databases suffering from both hardware and software corruption on versions 7.0 through 8.3. My rate is $300-600 USD/hour depending on the database/table size and the extent of the corruption. If you're just trying to save what's not corrupted, there's quite a few examples online. -- Jonah H. Harris
Re: [HACKERS] table corrupted
On Mon, Oct 26, 2009 at 12:55 PM, Joshua D. Drake j...@commandprompt.comwrote: This reply is wholly inappropriate for a Pg list. We are here to help people. If you have a consultancy, please feel free to list that but any discussion of rates is just plain rude. Please use better discretion in the future. Perhaps. Though, I only posted because you made it sound somewhat impossible and because I only know of a few ppl in the PG community that offer it and/or have done is successfully. Maybe letting people know there are options, other than being screwed, is wrong... my bad :-) -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] 8.5 TODO: any info on Create dump tool for write-ahead logs... in PITR section (1.4)?
On Sun, Oct 4, 2009 at 10:28 PM, Fujii Masao masao.fu...@gmail.com wrote: I think that xlogdump (http://xlogviewer.projects.postgresql.org/) is the first step to address that TODO item. Though I'm not sure if the xlogdump project is still active. I believe it has been dead for quite awhile now. Though, Tom may still maintain his own xlogdump. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] PGCluster-II Progress
On Tue, Sep 15, 2009 at 7:48 AM, Marcos Luis Ortiz Valmaseda mlor...@uci.cu wrote: Yeah, the problem here is that CyberCluster is based yet on PostgreSQL 8.1 and is a very old version to use it. I found the developer of PgCluster-II: Atsushi MITANI - mit...@sraw.co.jp Yeah, AFAICS, PGCluster II is and has been dead for years. I did an architectural review of PGCluster II while at EnterpriseDB, and it's never going to work from a performance perspective. Unfortunately, the architecture (as it stands) requires coordination of pretty much all locks and semaphores, and will only run on a single system because it requires a nearly-identical shared memory segment for each instance. Unlike Oracle RAC, which was designed to be run on separate nodes and share/coordinate only relevant resources, PGCluster II coordinates almost everything (transaction ids, buffer locks, etc.). As an example, Oracle doesn't need to communicate with every node in the cluster to generate a new transaction id (SCN in Oracle parlance) because it employs a system based on Lamport timestamps, but PGCluster II has to coordinate a global transaction id across all nodes. Also, PGCluster II has no concept of shared data/local WAL. Whereas Oracle supports threads of REDO/UNDO generated by the local node, all WAL in PGCluster II has to be coordinated between nodes, which creates a bottleneck on *all* concurrent workloads. When I first saw the demonstration of PGCluster II, I was in awe. Unfortunately, when I reviewed the architecture, I saw that it was too good to be true. Perhaps it has been rearchitected in private to overcome some of these issues, but I'm not aware of it. All attempts to talk to Atsushi about it were met with no response. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] 8.5 release timetable, again
On Thu, Aug 27, 2009 at 3:53 PM, David Fetter da...@fetter.org wrote: I would appreciate it if somebody could send out some messages of calm, while I/we work. The time for open review will come around soon enough. With all due respect, the time for open review is now. You have already tried closed development several times, and it each time has been, more or less, a spectacular failure. Unlike Robert and Heikki, I don't see you contributing to or assisting Simon's work. And, while I may be wrong, I doubt that you assisted in funding any of Simon's work on hot standby either. As such, it's my opinion that continuing to criticize him from the sidelines is not only rude, but is also a bad idea as it relates to his motivation in working on this feature. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 11:09 AM, Tomasz Olszak tols...@o2.pl wrote: So it looks like that plperlu function is executing from remote and local clients with the same set of environment variable. It has nothing to do with the environment variables. So I don't have a clue how can I iron out this issue. Finally, my low-level Oracle knowledge does benefit Postgres :) It's a TNS parsing error due to a combination of Oracle's use of a Lispish s-expression-like name-value pair format and Postgres' process listing format for remote connections. On connection, the Oracle client sends the current application name to the Oracle server (which is listed in the V$SESSION view); in the case of Postgres, the program name is the current backend process name text. Because Oracle picks up Postgres' backend text, postgres: www postgres 192.168.1.1(13243), the (13243) screws up Oracle's TNS parser which prevents it from resolving the connection. This doesn't happen when you're connected to PG locally, because the backend text is, postgres: www postgres [local]. The solution to this is to change the following line in src/backend/postmaster/postmaster.c: remote_port[0] == '\0' ? %s : %s(%s) TO remote_port[0] == '\0' ? %s : %s[%s] OR remote_port[0] == '\0' ? %s : %s:%s Which I would prefer as a nice change to make overall. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 1:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: We're not going to break a bunch of other applications in order to make some undocumented, unsupported Oracle thingie work (until they change it...). Got another solution? Unfortunately, that's the way Oracle has done it since before the existence of POSTGRES. I first encountered it while working on database links from PG to Oracle at EnterpriseDB, and the reason Tomasz couldn't find the answer to this online is because it's such a rare problem that Oracle has no reason to change it. Really, how many people have parenthesis in their program names? Similarly, the problem has always existed when connecting to Oracle from Postgres using DBI-Link or the oralink contrib module, there's just so few PG people connecting to Oracle that it hasn't really come up before. As for alternate solutions, the only thing I can think of is a config parameter to disable rewrite of the ps line. Frankly, I don't recall ever seeing a script that looked for (port) in the process list, but there are probably some home-grown ones out there. As for me, I'd prefer to separate the host and port via a colon, just as everything else does, but that isn't backward compatible. I would expect this to become more of an issue when we start getting SQL/MED more closely integrated with the server and people can more easily connect to other databases. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 12:50 PM, Tomasz Olszak tols...@o2.pl wrote: Thank you very much, I tried to solve it for about 2 weeks. I know that few people in the net have the same problem too. No problem :) -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 12:36 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Jonah H. Harris escribió: Wow, that's a really idiotic thing for Oracle to do. Well, being able to find out what applications are connected to the database is nice. But, it would also be nice if they stopped parsing the program name if/when it encounters a left/right parenthesis. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 2:04 PM, Alvaro Herrera alvhe...@commandprompt.comwrote: We already have one; it's called update_process_title. I have it turned off, and I still see the remote IP/port in the process list. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 2:26 PM, Jonah H. Harris jonah.har...@gmail.comwrote: On Mon, Mar 16, 2009 at 2:04 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: We already have one; it's called update_process_title. I have it turned off, and I still see the remote IP/port in the process list. Ahh, this is why: init_ps_display():set_ps_display(initial_str, true); Perhaps it should obey the configuration setting as well? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 2:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm finding it hard to believe that there is no way to override what Oracle's client library does --- there are *plenty* of situations where you don't really want a client command line exposed to the whole world. AFAIK, there is no way to override that. It's very low-level in their client stack, is operating-system specific, and has been there forever. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 3:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Someone should raise a support request / whatever they call them with Oracle to get this fixed on their side.. Heh. Why would they fix it when it's only a problem for 1% of their users in odd corner cases? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 3:21 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Because that's what a respectable business does when a customer runs into a bug with software they sell. It's not a bug, it's expected behavior. Not that I think it couldn't be better handled. I'm not trying to dig at this, but looking at it in terms of flexibility, rather than us change the way we display a port in the ps-line because it may break a couple hundred scripts, you seem to think it's more reasonable for a company with a product utilized by millions of users, installed in countless governments, and deployed in mission-critical areas, to risk changing a fairly mature and well-tested behavior because it affects fewer than 1% of its users per year; specifically, users who are trying to interoperate with a competing database? If it were my business, it doesn't seem like something I would put much effort into :) Whether or not they actually will fix it, I don't know, but they surely won't if no-one complains them about it. Wouldn't hurt :) -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 5:22 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, I wonder if you could do something malicious with it. Like, run a query along the lines of SELECT $$ (HOST=10.0.0.123) $$, connect()... to divert the connection to another server. Not any more malicious than a connection string in and of itself. It's only used as a hierarchical name-value pair string, nothing is executed from it. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Problem with accesing Oracle from plperlu functionwhen using remote pg client.
On Mon, Mar 16, 2009 at 8:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Hmm, I wonder if you could do something malicious with it. There are any number of scenarios where exposing the client command-line contents to other database users represents a security hole, quite independently of whether anything falls over depending on the line contents. (I wonder whether there are any Oracle clients that accept a password on the command line, for instance.) Sure they let you pass the password on the command line, but they don't recommend it. Most of the utilities accept the syntax: utility user/p...@instance Just doing u...@instance will generally prompt for a password. Ahh, the number of passwords I've recovered from shell history files as a consultant... good times :) The only reason this complaint is directed to us, and not Oracle, is that the complainant knows how far he's likely to get complaining to Oracle :-( I don't doubt that. But, like I said, it's really a matter of the application name. In our case, Postgres falls into that corner case and we either choose to do something about it or we don't. I put the temporary solution out there for anyone that has the problem. If we want to fix it long-term, we'd have to look at one of the previously discussed alternatives to using (port). I don't particularly care one way or another, but if we were to change the ps line format, I just wanted to say that I preferred host:port rather than host(port). -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Out parameters handling
On Fri, Mar 6, 2009 at 4:29 PM, Asko Oja asc...@gmail.com wrote: It was one of my worst Friday's finding out that this brain dead implementation of out parameters had been part of fuck up again. :) This time we did notice it two days too late. I wish for a way to use out parameters in functions only through some predefined prefix like in triggers new and old. Means i would like to limit referencing to out parameters to one prefix only defined in the beginning of declare section of stored procedure. It really sucks what kind of mistakes you can pass to production unknowingly. I would much prefer a way to prevent such nonsense. Here was the case where out parameters were with same names with select into field names resulting in null outcome. Just yesterday we had similar case with update statement. Well, it's a problem with the language not parsing things correctly and doing, in many cases, brain-dead replacements. I don't know of any developer using OUT parameters that doesn't run into this problem at one time or another :( -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] SYNONYMs revisited
On Wed, Mar 4, 2009 at 9:34 AM, Joshua Tolley eggyk...@gmail.com wrote: Way back in this thread[1] one of the arguments against allowing some version of CREATE SYNONYM was that we couldn't create a synonym for an object in a remote database. Will the SQL/MED work make this sort of thing a possibility? I realize since it's not standard anyway, there's still a discussion or two to be had about how precisely it should work, but thought I'd raise the possibility. While shaking my head In that movie-like slow-motion used as a precursor to an almost disastrous event, I see myself saying, no... OK, back to reality. SQL/MED does support foreign tables, which are basically synonyms for remote tables. Other than that, it has no real similarity to synonym behavior for other database objects such as views, functions, or local tables. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Hadoop backend?
On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote: In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more sophisticated implementation of this line of code, from smgropen: reln-smgr_which = 0; /* we only have md.c at present */ I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. It may even be easier/preferred to write a hadoop specific access method depending on what you're looking for from hadoop. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] pg_restore --multi-thread
On Thu, Feb 12, 2009 at 11:37 AM, Joshua D. Drake j...@commandprompt.comwrote: --num-workers or --num-connections would both work. --num-parallel? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Optimization rules for semi and anti joins
On Wed, Feb 11, 2009 at 8:05 AM, David Fetter da...@fetter.org wrote: As has been discussed here many, many times, the only kind of person who should be doing a patent search is a company's IP attorney, which you are not, and even if you were, under no circumstances would such a person paste that link in a public forum. First of all, it was not an intentional patent search. Secondly, I don't believe there's any restriction of explicitly what can and cannot be posted on a public Postgres mailing list. Should we have a kick-off policy for this kind of misbehavior? Shut up David. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Optimization rules for semi and anti joins
On Wed, Feb 11, 2009 at 11:19 AM, David Fetter da...@fetter.org wrote: This is a very big deal, as you are exposing every US PostgreSQL contributor to triple damages for knowing infringement. Are you saying you're going to pay all that out of your own pocket? Are you making a legal commitment, say, with a few tens of million dollars in escrow to back it? Per IRC, this discussion will (and likely should) be taken elsewhere. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Optimization rules for semi and anti joins
On Tue, Feb 10, 2009 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote (in response to Kevin Grittner's recent issues): Reflecting on this further, I suspect there are also some bugs in the planner's rules about when semi/antijoins can commute with other joins; After doing some math I've concluded this is in fact the case. Anyone want to check my work? FWIW, the logic looks correct to me. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Optimization rules for semi and anti joins
On Tue, Feb 10, 2009 at 8:09 PM, Jonah H. Harris jonah.har...@gmail.comwrote: On Tue, Feb 10, 2009 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote (in response to Kevin Grittner's recent issues): Reflecting on this further, I suspect there are also some bugs in the planner's rules about when semi/antijoins can commute with other joins; After doing some math I've concluded this is in fact the case. Anyone want to check my work? FWIW, the logic looks correct to me. Cripes! I just had an idea and it looks like the buggers beat me to it :( http://www.google.com/patents?id=4bqBEBAJdq=null+aware+anti-join -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Optimization rules for semi and anti joins
On Tue, Feb 10, 2009 at 8:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jonah H. Harris jonah.har...@gmail.com writes: Cripes! I just had an idea and it looks like the buggers beat me to it :( http://www.google.com/patents?id=4bqBEBAJdq=null+aware+anti-join I wonder if the USPTO is really clueless enough to accept this? Claim 1 would give Oracle ownership of the definition of NOT IN, and few of the other claims seem exactly non-obvious either. Yeah, I just looked up semi and anti-join optimization patents and Oracle/IBM have a ton. What an obvious exploitation of math for business gain. I doubt they'd be enforceable. I wish they'd just do away with software patents altogether :( -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] add_path optimization
On Wed, Feb 4, 2009 at 10:12 AM, Greg Stark st...@enterprisedb.com wrote: On Wed, Feb 4, 2009 at 3:07 PM, Kevin Grittner It's been about 23 hours and it's still running. @snide(Gee, it sure would be nice if we continued with that explain-in-progress patch I had sent in earlier...:) Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] 8.4 release planning
On Wed, Jan 28, 2009 at 4:28 AM, Peter Eisentraut pete...@gmx.net wrote: Greg Smith wrote: PostgreSQL advocacy point, one of the questions Tom asked about a bit upthread is still a bit hazy here. There are commercial database offerings selling into the trusted space already. While the use-cases you describe make perfect sense, I don't think it's clear to everyone yet if there's a unique draw to a PostgreSQL + selinux solution that the class of customers you're talking about would prefer it to purchasing one of those products. Is the cost savings the main driver here, or is there something else about a secure LAPP stack that makes it particularly compelling? According to the data available to me, it is a combination of doing it better than the other guys (e.g., a SELinux type interface instead of something handcrafted) and the usual cost savings. I don't know about better, but I would definitely say that it's a more integrated (with the OS) solution. Can you get Oracle to use SELinux policies? Sure. But it would take a combination of Label Security, Fine Grained Access Control tweaks, custom C functions, and custom policies to handle the access control. And, it would cost a helluva lot of money. In short, this would make Postgres quite a bit more appetizing to those who need this functionality, those who prefer SELinux-based policies, and those who don't have the time/money to do it in systems like Oracle. How many people is that? Based on my consulting experience and questions from DoD/DoE people specifically, I think the number of people needing this feature is fairly small right now. But, it wouldn't hurt us to have it. Just to make it clear, this feature wouldn't make Postgres a trusted database in any certification sense. So, using that term would likely cause confusion and get people who used it thinking it had an EAL certification into trouble. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] How to get SE-PostgreSQL acceptable
On Wed, Jan 28, 2009 at 9:49 PM, KaiGai Kohei kai...@ak.jp.nec.com wrote: IIRC, 0racle or M$ has a patent to rewrite WHERE clause for security purpose, so Tom suggested it should be implemented using a hook deployed within executor. Yes, it was Oracle. There are a couple newer revisions, but they're all based primarily on Patent #6487552, Database Fine-grained Access Control, Filed Oct 5, 1998/Issued Nov 26, 2002. The patent covers defining a security context, retrieving-defined policies from that context, and applying those policies by directly calling a security-context-related stored procedure in the WHERE clause as well as dynamically adding security-related predicates to the WHERE-clause. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: 8.4 release planning (was Re: [HACKERS] [COMMITTERS] pgsql: Automatic view update rules)
On Sun, Jan 25, 2009 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Particularly with regard to hot standby, which by any sane reading was not close to being committable on 1 November (a fortiori from the fact that it's *still* not committable despite large amounts of later work). While I haven't follwed every detail of this patch set, I'm not quite sure I see that as being very fair to Simon. Simon has put a lot of time into Hot Standby and has followed the pseudo-defacto community process from design through what he believes to be near-completion; he can't be sure of completion until someone reviews his work. Honestly, I'm not trying to marginalize your effort reviewing other patches, and I know everyone is busy. Hell, as much as I'd love to have this feature, I too have been unable to find enough time to review Simon's stuff. However, over the past few months, I seem to recall seeing Simon submit countless requests for review and regardless of whether it was completely ready to go November 1st or not, at this time I don't think anyone but Simon has a complete view of what his patch(es) even look like. Yet, albeit with almost no review from the committers, Simon has continually worked through testing, revising his patches, and requesting information and suggestions from the community. Frankly Tom, I don't know of anyone in the community with as much experience in the recovery code as you and Simon. So, any of the major edge case problems will probably only be found by you regardless of how many of us review Simon's work. I do know that this is a feature which a large number of Postgres users really want and were counting on being in 8.4. Looking forward, if no one wanted to review these patches in November, and seemingly no one wants to review them now, how can we expect this to change for 8.5? Can anyone point out something Simon did wrong in this process? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] 8.4 release planning
On Mon, Jan 26, 2009 at 11:11 AM, Merlin Moncure mmonc...@gmail.com wrote: What about a compromise solution: release 8.4 now, then focus on wrapping up the big ticket items that didn't make it into 8.4 into a quick (as possible) 8.5 release. This means no fests. That would depend on timing then. Trying to get people to upgrade to 8.4 is going to be difficult if they're waiting on Hot Standby, which means less in-the-field testing of the 8.4 code base until the 8.5 release. Similarly, if we're looking at a quick 8.5 around September/October (having no commit fests), that means it will probably be early 2011 for 8.6, which is fairly unacceptable for the other patches currently in the queue. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] 8.4 release planning
On Mon, Jan 26, 2009 at 11:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: That would depend on timing then. Trying to get people to upgrade to 8.4 is going to be difficult if they're waiting on Hot Standby, which means less in-the-field testing of the 8.4 code base until the 8.5 release. [ deja vu... ] Just like no one was going to bother upgrading to 8.3 because what they wanted wouldn't be there till 8.4, and the similar claims we heard about 8.2 and 8.1 before that ... I'm not trying to be an alarmist, I'm just stating what I saw when I was @ EDB. Customers, especially those with large databases or small admin teams, would definitely wait for features before upgrading. Some people waited specifically for HOT or features that would benefit them specifically. My only gripe with a small window between 8.4 and 8.5 was just that I believe people would be more likely to wait until 8.5 rather than upgrading twice in the same year. Though, as I generally like people to be using the latest version of PG, I'd certainly be happy to be wrong on this. Similarly, if we're looking at a quick 8.5 around September/October (having no commit fests), that means it will probably be early 2011 for 8.6, which is fairly unacceptable for the other patches currently in the queue. Right, one of the major considerations here is allowing other development to get started again (and not be looking at two years wait to see the light of day). Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] [PATCH] EnableDisableTrigger Cleanup Questions
On Thu, Nov 6, 2008 at 12:03 AM, Jonah H. Harris jonah.har...@gmail.comwrote: As I wasn't sure whether anyone agrees with my distaste for repurposing tgenabled as mentioned above, I have attached is a patch which minimally corrects the function comment for EnableDisableTrigger where fires_when is concerned. Was there a reason that this cleanup patch wasn't applied? -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] [PATCH] EnableDisableTrigger Cleanup Questions
On Wed, Jan 21, 2009 at 2:02 PM, Robert Haas robertmh...@gmail.com wrote: Was there a reason that this cleanup patch wasn't applied? 1. It was submitted after the deadline for CommitFest:November. Well, it's just comment changes... Oh, didn't realize that. That's what I get for replying without reading the patch... Yes :) -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Warning about the 8.4 release
On Tue, Jan 6, 2009 at 12:38 PM, Robert Haas robertmh...@gmail.com wrote: - WIP: Hash Join-Filter Pruning using Bloom Filters is in the commitfest I'm pulling this patch and resubmitting for 8.5. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: [HACKERS] Lockfree hashtables
On Wed, Dec 31, 2008 at 7:33 AM, Stephen R. van den Berg s...@cuci.nlwrote: The other day I bumped into some ideas about lockfree hashtables. Are these of any use in PostgreSQL? Lock-free and wait-free algorithms have been used in various databases, but most people tend to shy away from them because of their complexity, difficulty to debug, and low-level portability issues. I've used them in the past (lock-free hash tables and skip lists), and they're pretty awesome if used properly, but the majority of PG's current performance problems aren't generally found as part of our hash table implementation (which I'm quite fond of actually). FWIS, I think we'll look more into this sometime in the future. -- Jonah H. Harris, Senior DBA myYearbook.com
Re: About CMake (was Re: [HACKERS] [COMMITTERS] pgsql: Append major version number and for libraries soname major)
On Mon, Dec 29, 2008 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: But of course those are just as straightforward in autoconf. It's the not-straightforward stuff that's going to be a PITA to translate. As much as I dislike autotools, I really despise CMake; it's a nasty piece of work and I hope we don't switch to it. Though, as I must've missed it, what's the main complaint with the current build system? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Mon, Dec 15, 2008 at 7:24 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Here's an updated patch against head. Thanks. No problemo. NOTE, it appears that this (and the previous) patch PANIC with concurrent transaction log activity while database system is shutting down on shutdown if checksumming is enabled. This appears to be due to FlushBuffer (lines 1821-1828) during the checkpoint-at-shutdown. Yeah, I reported this issue several times. Hmm. Well, the easiest thing would be to add a !shutdown check for logging the hint bits during the shutdown checkpoint :) Of course, that would break the page for recovery, which was the whole point of putting that in place. I'd have to look at xlog and see whether that check can be deferred or changed. Or, did you already research this issue? Similarly, I ran a pgbench, performed a manual checkpoint, and corrupted the tellers table myself using hexedit but the system didn't pick up the corruption at all :( Heh :-) :( Alvaro, have you given up on the patch or are you just busy on something else at the moment? I've given up until we find a good way to handle hint bits. Various schemes have been proposed but they all have more or less fatal flaws. Agreed. Though, I don't want to see this patch get dropped from 8.4. ALL, Alvaro has tried a couple different methods, does anyone have any other ideas? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Mon, Dec 15, 2008 at 10:13 AM, Bruce Momjian br...@momjian.us wrote: Jonah H. Harris wrote: Alvaro, have you given up on the patch or are you just busy on something else at the moment? I've given up until we find a good way to handle hint bits. Various schemes have been proposed but they all have more or less fatal flaws. Agreed. Though, I don't want to see this patch get dropped from 8.4. ALL, Alvaro has tried a couple different methods, does anyone have any other ideas? Feature freeze is not the time to be looking for new ideas. I suggest we save this for 8.5. Well, we may not need a new idea. Currently, the problem I see with the checkpoint-at-shutdown looks like it could possibly be easily solved. Though, there may be other issues I'm not familiar with. Has anyone reviewed this yet? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Mon, Dec 15, 2008 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jonah H. Harris jonah.har...@gmail.com writes: On Mon, Dec 15, 2008 at 10:13 AM, Bruce Momjian br...@momjian.us wrote: Feature freeze is not the time to be looking for new ideas. I suggest we save this for 8.5. Well, we may not need a new idea. We don't really have an acceptable solution for the conflict with hint bit behavior. The shutdown issue is minor, agreed, but that's not the stumbling block. Agreed on the shutdown issue. But, didn't this patch address the hint bit setting as discussed? After performing a cursory look at the patch, it appears that hint-bit changes are detected and a WAL entry is written on buffer flush if hint bits had been changed. I don't see anything wrong with this in theory. Am I missing something? Now, in the case where hint bits have been updated and a WAL record is required because the buffer is being flushed, requiring the WAL to be flushed up to that point may be a killer on performance. Has anyone tested it? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Mon, Dec 15, 2008 at 11:50 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: That only does heap hint bits, but it does nothing about pd_flags, the btree flags (btpo_cycleid I think), and something else I don't recall at the moment. This was all solvable however. The big problem with it was that it was using a new bit in pd_flags in unsafe ways. To make it safe you'd have to grab a lock on the page, which is very probably problematic. :( Now, in the case where hint bits have been updated and a WAL record is required because the buffer is being flushed, requiring the WAL to be flushed up to that point may be a killer on performance. Has anyone tested it? I didn't measure it but I'm sure it'll be plenty slow. Yeah. What really sucks is that it would be fairly unpredictable and could easily result in unexpected production performance issues. It is pretty late in the process to continue with this design-related discussion, but I really wanted to see it in 8.4. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Mon, Dec 15, 2008 at 12:30 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: How hard would it be to just take an exclusive lock on the page when setting all these hint bits? I guess it will be intolerably slow then. If we were to say we have CRC now, but if you enable it you have 1% of the performance we will get laughed at. Well, Oracle does tell users that enabling full CRC checking will cost ~5% performance overhead, which is reasonable to me. I'm not pessimistic enough to think we'd be down to 1% the performance of a non-CRC enabled system, but the locking overhead would probably be fairly high. The problem is, at this point, we don't really know what the impact would be either way :( -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED compatible connection manager
On Fri, Dec 12, 2008 at 7:55 AM, Peter Eisentraut pete...@gmx.net wrote: Now I have a question about the FDW C interface. The way I understand it, an SQL/MED-enabled server and a FDW each have a specific API by which they communicate. Supposedly, each database vendor should be able to ship a binary library for its FDW and each SQL/MED-enabled server should be able to load and use it. (If you don't believe in binary compatibility, then I think there should at least be source-level interface compatibility.) Yes, all FDWs should be similar to ODBC drivers in that they are self-contained and interface with the database through a defined API. What happens inside them should be irrelevant to PG. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Sun, Dec 14, 2008 at 4:51 PM, Josh Berkus j...@agliodbs.com wrote: v11 doesn't apply to cvs head anymore I'm not currently working on this patch, sorry. Should we pull it from 8.4, then? Here's an updated patch against head. NOTE, it appears that this (and the previous) patch PANIC with concurrent transaction log activity while database system is shutting down on shutdown if checksumming is enabled. This appears to be due to FlushBuffer (lines 1821-1828) during the checkpoint-at-shutdown. Other than that, I haven't looked into what needs to be done to fix it. Similarly, I ran a pgbench, performed a manual checkpoint, and corrupted the tellers table myself using hexedit but the system didn't pick up the corruption at all :( Alvaro, have you given up on the patch or are you just busy on something else at the moment? -- Jonah H. Harris, Senior DBA myYearbook.com blkcrc-12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COCOMO Indians
On Thu, Dec 11, 2008 at 4:43 AM, Dmitry Turin [EMAIL PROTECTED] wrote: We would like to obtain your opinion on these two questions: This is the wrong place to do it. 2) We are captivated by price of Indians, we listened much about low quality of code, written by Indians, we are fearing, that American company will resale implementation to Indian subcontractor (i.e. real developers will be Indians anyway). Did you really just say that? (SQL50, HTML60) Because it seems that you haven't got the hint yet, I'll just say it frankly: No one really cares about your desired additions to Postgres. -Jonah -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Tue, Nov 25, 2008 at 8:38 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: I don't this is not empirical but really, 150 is very reasonable. Let's just set it to that by default and be done with it. It won't hurt anything and if they need more than that, they are already investigating either via the lists or via a vendor anyway. Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, Nov 14, 2008 at 2:52 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Other systems do it. For example, Oracle tracks column usage and attempts to determine the optimal statistics for that column (based on the queries that used it) on an iterative basis. We don't track column usage at all, so that option wouldn't be quite that easy to implement. Though, there are certain things ANALYZE would be able to determine with a little help, such as knowing to collect more samples for columns it finds extremely skewed data in. That kind of feedback loops are a bit dangerous. For starters, it would mean that your test system would behave differently than your production system, just because you run different queries on it. There's also all kinds of weird dynamic behaviors that could kick in. For example, a query could run fine for the first few hundred times, but then the analyzer notices that a certain column is being accessed frequently and decides to increase the stats target for it, which changes the plan, for worse. Usually the new plan would be better, but the planner isn't perfect. Oracle already thought of that a long time ago, which is why the plan has to come out better for it to take effect. As for bad plans, you obviously haven't used Postgres in production enough to deal with it continually changing plans for the worse due to index bloat, data skew, phase of the moon, etc. :) -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, Nov 14, 2008 at 10:50 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Oracle already thought of that a long time ago, which is why the plan has to come out better for it to take effect. Huh? We would never willingly choose a worse plan, of course, but the point is that what looks like a better plan, with a smaller cost estimate, is sometimes actually worse. Oracle bases it on cost and elapsed execution time. As for bad plans, you obviously haven't used Postgres in production enough to deal with it continually changing plans for the worse due to index bloat, data skew, phase of the moon, etc. :) You're right, I haven't, but yes I know that's a problem. We've chatted about that with Greg sometimes. It would be nice to have more stable plans. My favorite idea is to stop using the current relation size in the planner, and use the value snapshotted at ANALYZE instead. That way, the planner would be completely deterministic, based on the statistics. Then, we could have tools to snapshot the statistics, move them to a test system, store them, revert back to old statistics etc. Yes, plan stability would be a Good Thing(tm) IMO. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, Nov 14, 2008 at 11:43 AM, Tom Lane [EMAIL PROTECTED] wrote: IMHO, the only thing worse than an unstable plan is a stable one. Your opinion contradicts the majority of the industry then, I'm afraid. Like query hints, people are sometimes smarter than the optimizer. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, Nov 14, 2008 at 12:07 PM, Tom Lane [EMAIL PROTECTED] wrote: And, very often, they're not --- or more likely, they were smarter than the optimizer last year, but now conditions have changed. Failing to adapt to new conditions is exactly the problem with query hints, and in general with any insistence that plans should be stable. Well, at least they didn't have to wait a year to fix the problem. Similarly, whether or not the plan changed due to bad hints or bad plans, detecting the change is relatively easy, so I don't really see an argument based on *why* the plan failed. In my, and many others opinion, if you decide to take your query plan into your own hands, it's your problem if it fails. I do agree that hints are a little too nice and simple, and generally get people into trouble because they're hard-coded in an app, tend to cause issues later, and are then difficult to track down. Oracle solved that years ago as well, which is why they support more advanced plan stability features than just hints. However, given the number of large-scale OLTP sites I've been to, I can tell you from experience that post-ANALYZE plan changes wreak complete havoc on a system and in many cases, bring it to its knees. In those cases, the proper query plan is well-known, and a hint (or some other form of plan stability) is all that would be required to prevent it from happening. This is pretty off-topic for this thread, so I'll postpone the discussion for 8.5. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, Nov 13, 2008 at 3:20 PM, Grzegorz Jaskiewicz [EMAIL PROTECTED] wrote: If that's the situation, me thinks you guys have to start thinking about some sort of automated way to increase this param per column as needed. Is there any way planner could actually tell, that it would do better job with more stats for certain column ? Other systems do it. For example, Oracle tracks column usage and attempts to determine the optimal statistics for that column (based on the queries that used it) on an iterative basis. We don't track column usage at all, so that option wouldn't be quite that easy to implement. Though, there are certain things ANALYZE would be able to determine with a little help, such as knowing to collect more samples for columns it finds extremely skewed data in. There are other things that could be done as well... so the answer is, yes. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling archive_mode without restart
Anyway, I think this is worth fixing before release but it clearly isn't worth attempting to rush a patch in the next few hours. I don't think we'll find anyone who is happy with making it a restart-required option. I couldn't find a patch/commit for this and was just wondering whether someone is addressing it for 8.4? In a large-scale OLTP environment, uptime is paramount, and having to restart the database to enable PITR is a big PITA. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Hash Join-Filter Pruning using Bloom Filters
On Mon, Nov 10, 2008 at 2:42 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote: I have tested the Bloom filter patch. It compiles cleanly against HEAD. Thank you for testing this! As indicated, the performance improvements for hash join are good, especially when the build table is filtered with a selection condition. Performance improvements range from a couple of percent up to 20% for multi-batch joins. Note that the bloom filter will slightly slow queries where the filter has no benefit. I have a new patch which does not create a bloom filter unless it sees that the hash join is going to batch. I'll send it along later tonight. I have not looked at the actual implementation of the Bloom filter, but will proceed to do that next. One issue to be considered is how the space used for the bloom filter is related to the work_mem allocated to the join. That is, does the bloom filter consume some of the work_mem space or is it treated as additional memory allocated to the join. Currently it's additional space not accounted for by work_mem. Additionally, it's a good amount more space than is required. This is fixed in the newer patch as well. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Page space reservation (pgupgrade)
On Sun, Nov 9, 2008 at 7:55 PM, Decibel! [EMAIL PROTECTED] wrote: On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote: That's my question. Why is this needed at all? I suspect this is to deal with needing to reserve space in a cluster that you're planning on upgrading to a new version that would take more space, but I think the implementation is probably too simplistic. Well, if that's what it is, I think it's a fairly poor design decision. When I upgrade Oracle, SQL Server, or MySQL, I don't need to plan the amount of free space in my blocks a year or more before an upgrade. In fact, I don't have to plan it at all... it's completely handled by the in-place upgrade. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL5 budget
On Sun, Nov 9, 2008 at 7:41 PM, Decibel! [EMAIL PROTECTED] wrote: I think you're barking up the wrong tree here; the community can't really do hacking for hire. If you want to pay for something to be implemented (which is great!), you'll need to talk to companies that do Postgres consulting. You can find examples on the website and through google. You could also try posting to pgsql-jobs. I would suggest submitting it to pgsql-jobs. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)
When performing a PITR copy of a data cluster, the pg_xlog directory is generally omitted. As such, when starting the copy up for replay/recovery, the WAL directories need to be recreated. This patch checks to see whether XLOGDIR and XLOGDIR/archive_status exist on XLOGStartup and if not, recreates them. -- Jonah H. Harris, Senior DBA myYearbook.com arcstatdir.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)
On Sat, Nov 8, 2008 at 4:08 PM, Tom Lane [EMAIL PROTECTED] wrote: This has been suggested before but I'm unconvinced that it's a good idea. It's reasonably common for pg_xlog to be a symlink. If you neglect to re-establish the symlink then what would happen is that xlog gets recreated on the data disk, and with no notice you are running in a degraded mode. Agreed on the basis that people sometimes forget to symlink. That's the reason why I was echoing a message. Initially the message was WARNING, but I degraded it to LOG. It might be reasonable to auto-recreate XLOGDIR/archive_status, though. Attached. BTW, I have seen people create both pg_xlog and archive_status as files, which is why I'm validating that in this function rather than waiting for it to error-out later in the code. -- Jonah H. Harris, Senior DBA myYearbook.com arcstatdir_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Page space reservation (pgupgrade)
On Sat, Nov 8, 2008 at 8:08 PM, Tom Lane [EMAIL PROTECTED] wrote: Zdenek Kotala [EMAIL PROTECTED] writes: Attached patch allows to setup storage parameter for space reservation. What is the point of this? That's my question. Why is this needed at all? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] EnableDisableTrigger Cleanup Questions
On Thu, Nov 6, 2008 at 9:01 AM, Tom Lane [EMAIL PROTECTED] wrote: It would have been useful to make this criticism before 8.3 was released. I don't think it's reasonable to change it now. Well, I didn't have time to review code back in the 8.3 days, and ugly is ugly regardless of when it was originally committted. I'm not saying it needs to be an 8.4 fix, just that as a whole, several of the components of that patch (including rewrite) seem to be a little hackish and that they could be cleaned up in 8.5. I would imagine someone will be working on trigger-related code in 8.5, and just thought it would be nice to clean it up if one had the time to do so. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] EnableDisableTrigger Cleanup Questions
On Thu, Nov 6, 2008 at 10:08 AM, Tom Lane [EMAIL PROTECTED] wrote: I have no objection to cleaning up the backend internals, but system catalog definitions are client-visible. I don't think we should thrash the catalog definitions for minor aesthetic improvements. Since 8.3 is already out, that means client-side code (like pg_dump and psql, and probably other programs we don't control) is going to have to deal with the existing definition for the foreseeable future. Dealing with this definition *and* a slightly cleaner one isn't a net improvement from the client standpoint. Well, it didn't seem like anyone had an issue changing the definition at 8.3 time. As for pg_dump/psql, those changes are fairly simple. And, there aren't that many PG utilities out there. PGAdmin looks like it would require a 1-3 line change (depending on coding preferences) and I don't see anything that checks it in Slony. I'm fine with cleaning up the internal-side, I just don't think there's that much relying on tgenabled. In fact, Google code search seems to show more things relying on a boolean tgenabled rather than the current implementation. Oh well, it was just a thought. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] broken URL in commitfest page
On Wed, Nov 5, 2008 at 12:35 PM, Simon Riggs [EMAIL PROTECTED] wrote: The Join Removal item fails to point to a patch, also. I've removed that entry now. The patch was being worked on by Jonah but it looks like we didn't make the deadline. Well, what is the official deadline on it? It, like several other patches on the wiki, was a WIP. I'm hopeful that RI-based join elimination for JOIN_INNER should be ready tonight based on your and Tom's comments. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] EnableDisableTrigger Cleanup Questions
While working on the join elimination patch, I was going through the trigger code and found quite a bit of nastiness in regard to naming and variable repurposing related to the addition of replication roles in 8.3. The most obvious issue is that tgenabled was switched from a bool to char to support replication roles. From a naming standpoint, the term enabled generally implies boolean and is fairly consistently used as such in other functions within the core. My initial preference would be to return tgenabled to its original boolean for use only in enabling/disabling triggers. Then, I'd probably add another boolean entry (tgreplica?) for use in determining whether the trigger should be fired on origin/local or replica. Otherwise, the naming of EnableDisableTrigger and friends seems a bit contradictory due to the fact that it has the ability to convert a trigger into a replica trigger. Similarly, I can't see any reason for keeping the structure member name the same, especially when the change from bool to char broke backward compatibility anyway. Thoughts? As I wasn't sure whether anyone agrees with my distaste for repurposing tgenabled as mentioned above, I have attached is a patch which minimally corrects the function comment for EnableDisableTrigger where fires_when is concerned. -- Jonah H. Harris, Senior DBA myYearbook.com endisable_trig_fctn_commnt_cleanup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Cleanup of PLpgSQL_recfield
While looking to add some functionality to PL/pgSQL, I found that the rfno member of the PLpgSQL_recfield structure is unused. This patch is just a cleanup and doesn't seem along the same lines as the patches in CommitFest... should I add it to the wiki anyway? -- Jonah H. Harris, Senior DBA myYearbook.com plpgsql_unused_recrfno.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of PLpgSQL_recfield
On Tue, Nov 4, 2008 at 3:57 PM, Tom Lane [EMAIL PROTECTED] wrote: I am not real sure why the code is inconsistent about spelling the second field's name differently in some of the structs, but it seems like a bad idea --- as you've demonstrated, it invites confusion. What would probably be better is a patch to rename exprno, rfno, etc to all be called dno to make this connection more obvious. Attached. Passed regressions and basic testing. -- Jonah H. Harris, Senior DBA myYearbook.com plpgsql_datumnaming_cleanup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Hash Join-Filter Pruning using Bloom Filters
All, Attached is an initial patch I've been playing with which uses Bloom filters to reduce unnecessary processing of outer tuples in hash joins. In short, this works by creating a Bloom filter, adding all relevant tuples for the inner relation, and querying the filter (for existence) when retrieving tuples from the outer relation. This avoids unnecessary tuple movement and bucket searches for matches we already know can't exist. Currently it works only for JOIN_INNER, but could be modified to optimize anti/semi joins as well. Similarly, I created a GUC to enable pruning, named bloom_pruning. Rather than performing k hash functions, this implementation simply sets a bit based on the already-computed hash value. I wanted to send this around for reviews and comments before working on it further. As this isn't overly intrusive, if someone can commit to reviewing and providing input, I'll commit to having this ready for 8.4. -- Jonah H. Harris, Senior DBA myYearbook.com bloompruning_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Hash Join-Filter Pruning using Bloom Filters
On Sun, Nov 2, 2008 at 5:36 PM, Hannes Eder [EMAIL PROTECTED] wrote: On Sun, Nov 2, 2008 at 10:49 PM, Jonah H. Harris [EMAIL PROTECTED] wrote: Similarly, I created a GUC to enable pruning, named bloom_pruning. I guess calls to bloom_filter_XXX should be surrounded by if (bloom_pruning) ... or a similar construct, i.e. make use of the GUC variable bloom_pruning in the rest of the code. It's effective as-is for a preliminary patch. The GUC code is the least of my worries. Can you provide some figures on the performance impact of the bloom filter? It depends on the queries. I've been trying to find a good suite of hash join tests... but not much luck. CREATE TABLE t1 (id INTEGER PRIMARY KEY, x INTEGER); CREATE TABLE t2 (id INTEGER PRIMARY KEY, x INTEGER); INSERT INTO t1 (SELECT ge, ge % 100 FROM generate_series(1, 100) ge); INSERT INTO t2 (SELECT * FROM t1); VACUUM ANALYZE; SELECT COUNT(*) FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; SET bloom_pruning TO off; EXPLAIN SELECT COUNT(*) FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing SELECT COUNT(*) FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing EXPLAIN SELECT * FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing SELECT * FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing SET bloom_pruning TO on; \timing SELECT COUNT(*) FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing EXPLAIN SELECT * FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing SELECT * FROM t1, t2 WHERE t1.id = t2.id AND t1.x 30 AND t2.x 10; \timing -- Without Pruning Time: 1142.843 ms Time: 1567.355 ms -- With Pruning Time: 891.557 ms Time: 1269.634 ms -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: Please, DO NOT MOVE position of page version in PageHeader structure! And PG_PAGE_LAYOUT_VERSION should be bump to 5. Umm, any in-place upgrade should be capable of handling changes to the page header. Of, did I miss something significant in the in-place upgrade design? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 11:14 AM, Tom Lane [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: On Thu, Oct 30, 2008 at 10:33 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: Please, DO NOT MOVE position of page version in PageHeader structure! And PG_PAGE_LAYOUT_VERSION should be bump to 5. Umm, any in-place upgrade should be capable of handling changes to the page header. Well, yeah, but it has to be able to tell which version it's dealing with. I quite agree with Zdenek that keeping the version indicator in a fixed location is appropriate. Most of the other databases I've worked, which don't have different types of pages, put the page version as the first element of the page. That would let us put the crc right after it. Thoughts? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 11:27 AM, Tom Lane [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: On Thu, Oct 30, 2008 at 11:14 AM, Tom Lane [EMAIL PROTECTED] wrote: Well, yeah, but it has to be able to tell which version it's dealing with. I quite agree with Zdenek that keeping the version indicator in a fixed location is appropriate. Most of the other databases I've worked, which don't have different types of pages, put the page version as the first element of the page. That would let us put the crc right after it. Thoughts? Fixed location does not mean let's move it. Just trying to be helpful. Just thought I might give some insight as to what others, who had implemented in-place upgrade functionality years before Postgres' existence, had done. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 30, 2008 at 12:14 PM, Alvaro Herrera [EMAIL PROTECTED] wrote: Gregory Stark escribió: What I'm wondering though -- are we going to make CRCs mandatory? Or set aside the 4 bytes even if you're not using them? Because if the size of the page header varies depending on whether you're using CRCs that sounds like it would be quite a pain. Not mandatory, but the space needs to be set aside. (Otherwise you couldn't turn it on after running with it turned off, which would rule out using the database after initdb). Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, Oct 29, 2008 at 12:40 PM, David Fetter [EMAIL PROTECTED] wrote: Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. To be more specific, SQL/MED is going to be 8.5. This is an overall improvement for accessing the predicate. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED compatible connection manager
On Mon, Oct 27, 2008 at 10:06 AM, Martin Pihlak [EMAIL PROTECTED] wrote: So the proposal is to implement a small subset of SQL/MED to cope with connection info management -- connection manager. This will only manage the connection metadata and provide the required system catalogs and commands for maintaining them. The actual connection management (open/close etc.) is still handled by the client modules. Per SQL:2008, there are no expected changes to SQL/MED from SQL:2003. As such, two weeks ago, I completed a full review of SQL/MED and am planning to fully implement it for 8.5. Currently, I'm working on a proof of concept and have created a SQL/MED access method (sqlmed) as well as started implementing the FDW API and hooks into the optimizer to support remote capabilities, costing, and predicate pushdown. The first wrappers I intend to support are ODBC and This is a large project, and I'm certainly open to assistance :) -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED compatible connection manager
On Mon, Oct 27, 2008 at 10:35 AM, Jonah H. Harris [EMAIL PROTECTED] wrote: The first wrappers I intend to support are ODBC and Damn multiple windows :) The first wrappers I intend to support are ODBC and CSV/fixed-width text. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED compatible connection manager
On Mon, Oct 27, 2008 at 11:31 AM, Martin Pihlak [EMAIL PROTECTED] wrote: Cool. Have you published some notes on it (wiki etc)? Not yet. Discussed it a little on irc, but nothing substantial. I'll look at updating the Wiki hopefully today. It certainly is an undertaking :) I'm mostly interested in the connection management -- so hopefully I can help there. That would be awesome! -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Fri, Oct 24, 2008 at 7:59 AM, Hannu Krosing [EMAIL PROTECTED] wrote: On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote: While we could build an abstract prefetch interface and simply use fadvise for it now (rather than OS-specific code), I don't see an easy win in any case. When building an abstract interface, always use at least two implementations (I guess that would be fadvise on linux and AIO on solaris in this case). You are much more likely to get the interface right this way. I agree, I just wasn't sure as to whether Greg's patch supported both methods. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith [EMAIL PROTECTED] wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. fadvise is a kludge. While it will help, it still makes us completely reliant on the OS. For performance reasons, we should be supporting a multi-block read directly into shared buffers. IIRC, we currently have support for rings in the buffer pool, which we could read directly into. Though, an LRU-based buffer manager design would be more optimal in this case. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian [EMAIL PROTECTED] wrote: True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. I'd say, optimistically, maybe 30-45% the benefit over a proper multi-block read using O_DIRECT. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark [EMAIL PROTECTED] wrote: I couldn't get async I/O to work on Linux. That is it worked but performed the same as reading one block at a time. On solaris the situation is reversed. Hmm, then obviously you did something wrong, because my tests showed it quite well. Pull the source to iozone or fio. In what way is fadvise a kludge? non-portable, requires more user-to-system CPU, ... need I go on? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Fri, Oct 24, 2008 at 12:42 AM, Tom Lane [EMAIL PROTECTED] wrote: non-portable, requires more user-to-system CPU, ... need I go on? I'd be interested to know which of these proposals you claim *is* portable. The single biggest reason to reject 'em all is that they aren't. Yes, that was bad wording on my part. What I mean to say was unpredictable. Different OSes and filesystems handle fadvise differently (or not at all), which makes any claim to performance gain configuration-dependent. My preferred method, using O_DIRECT and fetching directly into shared buffers, is not without its issues or challenges as well. However, by abstracting the multi-block read interface, we could use more optimal calls depending on the OS. Having done a bit of research and testing in this area (AIO and buffer management), I don't see any easy solution. fadvise will work on some systems and will likely give some gain on them, but won't work for everyone. The alternative is to abstract prefetching and allow platform-specific code, which we rarely do. While we could build an abstract prefetch interface and simply use fadvise for it now (rather than OS-specific code), I don't see an easy win in any case. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly access a buffer's LSN using existing access macros
On Mon, Oct 20, 2008 at 5:23 PM, Alvaro Herrera [EMAIL PROTECTED] wrote: Hmm, I just noticed this comment in bufpage.h (which was also in Jonah's patch) :-( typedef struct PageHeaderData { /* XXX LSN is member of *any* block, not only page-organized ones */ Passed regressions and several benchmarks for me. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Fri, Oct 17, 2008 at 11:26 AM, Alvaro Herrera [EMAIL PROTECTED] wrote: So this discussion died with no solution arising to the hint-bit-setting-invalidates-the-CRC problem. I've been busy. Apparently the only solution in sight is to WAL-log hint bits. Simon opines it would be horrible from a performance standpoint to WAL-log every hint bit set, and I think we all agree with that. So we need to find an alternative mechanism to WAL log hint bits. Agreed. I thought about causing a process that's about to write a page check a flag that says this page has been dirtied by someone who didn't bother to generate WAL. If the flag is set, then the writer process is forced to write a WAL record containing all hint bits in the page, and only then it is allowed to write the page (and thus calculate the new CRC). Interesting idea... let me ponder it for a bit. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Fri, Oct 17, 2008 at 12:05 PM, Greg Stark [EMAIL PROTECTED] wrote: Heikki had a clever idea earlier which was to have two crc checks- one which skips the hint bits and one dedicated to hint bits. If the second doesn't match we clear all the hint bits. Sounds overcomplicated to me. The problem with that is that skipping the hint bits for the main crc would slow it down severely. It would make a lot of sense if the hint bits were all in a contiguous block of memory but I can't see how to make that add up. Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt [EMAIL PROTECTED] wrote: I have a stupid question wrt hint bits and CRC checksums- it seems to me that it should be possible, if you change the hint bits, to be able to very easily calculate what the change in the CRC checksum should be. Doesn't the problem still remain? The problem being that the buffer can be changed as it's written, yes? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 9:36 AM, Brian Hurt [EMAIL PROTECTED] wrote: Another possibility is to just not checksum the hint bits... Seems like that would just complicate matters and prevent a viable checksum. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 9:42 AM, Gregory Stark [EMAIL PROTECTED] wrote: It's even worse than that. Two processes can both be fiddling hint bits on different tuples (or even the same tuple) at the same time. Agreed. Back to the double-buffer idea, we could have a temporary BLCKSZ buffer we could use immediately before write() which we could copy the block to, perform the checksum on, and write out... is that what you were thinking Tom? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 10:09 AM, Andrew Chernow [EMAIL PROTECTED] wrote: I read through this patch and am curious why 0xdeadbeef was used as an uninitialized value for the page crc. Is this value somehow less likely to have collisons than zero (or any other arbitrary value)? It was just an arbitrary value I chose to identify non-checksummed pages; I believe would have the same collision rate as anything else. Would it not be better to add a boolean bit or byte to inidcate the crc state? Ideally, though we don't have any spare bits to play with in MAXALIGN=4. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 10:27 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Ideally, though we don't have any spare bits to play with in MAXALIGN=4. In the page header? There's plenty of free bits in pd_flags. Ahh, didn't see that. Good catch! But isn't it a bit dangerous to have a single flag on the page indicating whether the CRC is valid or not? Any corruption that flips that bit would make the CRC check to be skipped. Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 10:41 AM, Tom Lane [EMAIL PROTECTED] wrote: Not checksumming the hint bits *is* a solution to the torn page problem. Yeah, but it has enough drawbacks that I'd like to keep looking for alternatives. Agreed. One argument that I've not seen raised is that not checksumming the hint bits leaves you open to a single-bit error that incorrectly sets a hint bit. Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
So, it comes down to two possible designs, each with its own set of challenges. Just to see where to go from here... I want to make sure the options I've seen in this thread are laid out clearly: 1. Hold an exclusive lock on the buffer during the call to smgrwrite OR 2. Doublebuffer the write OR 3. Do some crufty magic to ignore hint-bit updates Because option 3 not only complicates the entire thing, but also makes corruption more difficult to detect, I don't consider it viable. Can anyone provide a reason that makes this option viable? Option 1 will prevent hint-bit updates during write, which means we can checksum the buffer and not worry about it. Also, is only the buffer content lock required? This could potentially slow down concurrent transactions reading the block and/or writing hint bits. Option #2 consists of copying the block to a temporary buffer, checksumming it, and pushing the checksummed block down to write() (at smgr/md/fd depending on where we want to perform the checksum). From my perspective, I prefer #2 and performing it at the sgmr layer, but I am open to suggestions. Tom, what are your thoughts? #1 isn't very difficult, but I can see it potentially causing a number of side-problems and it would require a fair amount of testing. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 12:05 PM, Aidan Van Dyk [EMAIL PROTECTED] wrote: How does your current write strategy handle this situation. I mean, how do you currently guarnetee that between when you call write() and the kernel copies the buffer internally, no hint-bit are updated? Working on the exact double-buffering technique now. #define write(fd, buf, count) buffer_crc_write(fd, buf, count) I certainly wouldn't interpose the write() call itself; that's just asking for trouble. whatever protection you have on the regular write is sufficient. The time of the protection will need to start before the buffer period instead of just the write, (and maybe not the write syscall anymore) but with CPU caches and speed, the buffer period should be = the time of the write() syscall... Your fsync is your on disk guarentee, not the write, and that won't change. Agreed. But I thought you didn't really care about hint-bit updates, even in the current strategy... but I'm fully ignorant about the code, sorry... The current implementation does not take it into account. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 12:51 PM, Aidan Van Dyk [EMAIL PROTECTED] wrote: But I thought you didn't really care about hint-bit updates, even in the current strategy... but I'm fully ignorant about the code, sorry... The current implementation does not take it into account. So if PG currently doesn't care about the hit-bits being updated, during the write, then why should introducing a double-buffer introduce the a torn-page problem Tom mentions? I admit, I'm fishing for information from those in the know, because I haven't been looking at the code long enough (or all of it enough) to to know all the ins-and-outs... PG doesn't care because during hint-bits aren't logged and during normal WAL replay, the old page will be pulled from the WAL. I believe what Tom is referring to is that the buffer PG sends to write() can still be modified by way of SetHintBits between the time smgrwrite is called and the time the actual write takes place, which is why we can't rely on a checksum of the buffer pointer passed to smgrwrite and friends. If we're double-buffering the write, I don't see where we could be introducing a torn-page, as we'd actually be writing a copied version of the buffer. Will look into this. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 1:07 PM, Bruce Momjian [EMAIL PROTECTED] wrote: If we're double-buffering the write, I don't see where we could be introducing a torn-page, as we'd actually be writing a copied version of the buffer. Will look into this. The torn page is during kernel write to disk, I assume, so it is still possible. Well, we can't really control too much of that. The most common solution to that I've seen is to double-write the page (which some OSes already do regardless). Or, are you meaning something else? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
It's not the buffeting it's the checksum. The problem arises if a page is read in but no wal logged modifications are done against it. If a hint bit is modified it won't be wal logged but the page is marked dirty. Ah. Thanks Greg. Let me look into this a bit before I respond :) -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 1:58 PM, Gregory Stark [EMAIL PROTECTED] wrote: On recovery after a torn-page write, won't the recovery of the full_page_write WAL + WAL changes get us back to the page as it was before the buffer+checksum+write? Hint bit setting doesn't trigger a WAL record. Hence, no page image is written to WAL for later use in recovery. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Thu, Oct 2, 2008 at 1:44 PM, Alvaro Herrera [EMAIL PROTECTED] wrote: How about when a hint bit is set and the page is not already dirty, set the checksum to the always valid value? The problem I have with this idea is that there would be lots of pages excluded from the CRC checks, a non-trivial percentage of the time. I don't like that because it trades-off corruption detection (the whole point of this feature) for a slight performance improvement. Maybe we could mix this with Simon's approach to counting hint bit setting, and calculate a valid CRC on the page every n-th non-logged change. I still think we should only calculate checksums on the actual write. And, this still seems to have an issue with WAL, unless Simon's original idea somehow included recording hint bit settings/dirtying the page in WAL. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Wed, Oct 1, 2008 at 9:25 AM, Tom Lane [EMAIL PROTECTED] wrote: Harald Armin Massa [EMAIL PROTECTED] writes: WHAT should happen when corrupted data is detected? Same thing that happens now, ie, query fails with an error. This would just be an extension of the existing validity checks done at page read time. Agreed. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Wed, Oct 1, 2008 at 11:36 AM, Tom Lane [EMAIL PROTECTED] wrote: I probably wouldn't compare checksumming *every* WAL record to a single block-level checksum. No, not at all. Block-level checksums would be an order of magnitude more expensive: they're on bigger chunks of data and they'd be done more often. That's debatable and would be dependent on cache and the workload. In our case however, because shared buffers doesn't scale, we would end up doing a lot more block-level checksums than the other vendors just pushing the block to/from the OS cache. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers