Re: [HACKERS] pg_restore stuck in a loop?
Rod Taylor [EMAIL PROTECTED] writes: On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: You tell us ;-). You've got the test case, attach to it with a debugger and find out what it's doing. I wasn't entirely sure how to catch it in action so I just used CTRL+C to interrupt the pg_restore process, and got these as backtraces: Hm, I wonder if there is something broken about the SQL parsing logic in _sendSQLLine, such that it could go into an infinite loop given the right input? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Increased company involvement
Joshua D. Drake wrote: However, there was a lot of coordination that happened with Fujitsu that I don't see happening with the current companies involved. Companies are already duplicating work that is also done by community members or by other companies. That is bound to happen no matter what. Look at plJava and plJ. Some people just feel that their way is better. Some people just don't get along etc... Actually, I think that PL/Java versus PL/J is a good example of where some coordination would have helped a lot. The short story: I was between jobs in December 2003 through February the following year. A lot of work on PL/Java was made during that time. I had no clue that there was another active project with similar objectives until after my first fully functional submission to gborg. Had I known, the outcome would have been different. Today there are ongoing and very active efforts to collaborate. The longer story (if anyone is interested): Before I started PL/Java I informed the community of my intentions (see hackers thread pljava revisited http://archives.postgresql.org/pgsql-hackers/2003-12/msg00310.php ). I got a lot of feedback and good advice such as using C instead of C++, hosting the project at gborg, etc. but nobody told me back then that there was an active PL/J project. I found traces of that project on sourceforge but it seemed to have been dead for over a year. At that time there was no redirect from sourceforge. Jan Wieck started the thread PL/Java issues http://archives.postgresql.org/pgsql-hackers/2003-12/msg00819.php in which I made 2 major posts. Nothing in that thread indicated that there was an ongoing project and I got no reply to my posts. On January the 7th, I made my first submission to gborg. When I, in mid February, realized that the PL/J project was indeed alive and active, I wrote the PL/Java - next step? http://archives.postgresql.org/pgsql-hackers/2004-02/msg00713.php where I outlined possible futures for PL/Java and PL/J. The outcome of that was that PL/J (Dave Cramer and Laszlo Hornyak) and I had an IRC meeting where we agreed on some limited collaboration (see Minutes from Pl/Java -next step IRC http://archives.postgresql.org/pgsql-hackers/2004-03/msg00171.php ). From that point and until a month or so ago, the active collaboration between the projects could have been better. Some things did happen though. Laszlo asked me to publish some PL/Java interfaces in a public maven repository which I did and we had some discussions. I made an attempt to have a major sponsor step in and take the lead in a project aiming to provide a flexible solution where a choice of approach could be made but the sponsor understandably wanted to wait and see. Today, we (the PL/Java and PL/J project members) make common efforts to factor out client tools that indeed can be common to a separate project. We are also discussing how to make the PostgreSQL user experience as similar as possible and thus allowing use of PL/Java or PL/J without changing anything but configuration. Kind regards, Thomas Hallgren ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
On K, 2005-04-27 at 22:21 -0700, Joshua D. Drake wrote: However, there was a lot of coordination that happened with Fujitsu that I don't see happening with the current companies involved. Companies are already duplicating work that is also done by community members or by other companies. That is why we have 80 Linux distributions and a dozen FreeBSD distributions (can I include MacOSX?). I guess more aprropriate comparison would be to distibution specific linux kernels, i.e. RedHat linux kernel vs. suse linux kernel v.s. vanilla or real :) linux kernel. The big issue is communication. Because the PostgreSQL code base is common for most of the companies involved, there has to be coordination in what they are working on and their approaches. I can see this as an issue but sometimes that community is a hampering course as well. I recognize the community goals and respect them but in some things the community can move really slow. From what I can tell some of this is caused by the no new features rules etc... In business moving slow can mean death to a project. Which is why (hate to beat a dead horse) many OSS projects have moved to 6 month release cycles. Well, it is a two-sided thing. On one hand, businesses usually need new features yesterday, but on the other hand, business would loose most of the benefit of getting the feature fast, if it is not included in the main branch along the road, preferrably in the next official release, because said business would be dependent of the implementor of his specific feature for integrating _all_ other new and desirable fetures of next releas in their specific version of postgres. is happening. I realize this is hard for companies because their efforts are in some ways part of their profitability. That is true, there are sometimes strategic reasons to not annouce a project. I can see no strategic advantages for sponsors in not announcing the project. There may be some small tactical wins for the actual implementors, but not even tactical wins the sponsors. There may be some strategic/tactical reasons for sponsors to announce who they (the sponsors) are, but I can't see any reason not to announce the project or not to encourage/ask/demand the implementor to do so. profitability require duplication of effort and code collisions? I am not sure, but if it does, we are in trouble. I am not sure the community has the resources to resolve that many collisions. Which is why you are starting to see forks such as Bizgres but it is also why you are seeing forks go away (Mammoth PostgreSQL). At first glance at least, BizGres looks like a community oriented project and I hope that BizGres will be a testbed/early implementation of some DataWarehouse features, which will be integrated back to postgres at first possibility. But I too expected the discussion to take place on pgsql-hackers, not some half-hidden mailinglist on pgfoundry. Or at least an announcement of that mailinglist to be made on pgsql-hachers. OTOH, there are some things (2PC, recursive queries, ...) which are discussed on psql-hacker and still are lingering somewhere and are for various reasons (one of them may be lack of time/money/sponsor-pressure for the developer) not in the postgres proper yet. OTOOH, I see that a growing number of companies who might think of sponsoring PG development are interested in BI/DW and OLAP features in addition to plain OLTP enchancements. -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
Hannu Krosing wrote: Which is why (hate to beat a dead horse) many OSS projects have moved to 6 month release cycles. Well, it is a two-sided thing. On one hand, businesses usually need new features yesterday, but on the other hand, business would loose most of the benefit of getting the feature fast, if it is not included in the main branch along the road, preferrably in the next official release, because said business would be dependent of the implementor of his specific feature for integrating _all_ other new and desirable fetures of next releas in their specific version of postgres. Yes, you can bet that will happen. You can also bet that the community release will have 2X more feature additions than any commercial release that is a version behind unless _huge_ sums of money are thrown at the commercial release. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Feature freeze date for 8.1
You might remember that when we released 8.0, the plan was to have a 12-month development cycle for 8.1, unless there were Win32 problems that required complex fixes, in which case we would have a shorter 8.1 cycle. Well the good news is that there have been almost no Win32 problems, but the other good news is that we are getting a lot of powerful features for 8.1 already: o two-phase (Heikki Linnakangas, almost done) o multiple out function paramters (Tom, done) o bitmappted indexes (Tom, almost done) o shared row locks (Alvaro, almost done) o integrated auto-vacuum (Bruce) o buffer cache fixes for SMP (Tom, done) It is possible all these items will be done by sometime in June. Now, if that happens, do we want to continue with the 12-month plan or shorten the 8.1 release cycle, perhaps targeting a release in the September/October timeframe? The current core proposal is to do feature freeze on July 1, with the understanding that we will be done most of the items above by then and have the outstanding patches applied. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [proposal] protocol extension to support loadable stream filters
Brent Verner [EMAIL PROTECTED] writes: Would it be sane to recognize a specific PG_PROTOCOL_MAJOR to enter the filter-negotiation process? PG_PROTOCOL_MINOR would then be used to lookup and call a ptr to the filter's create() in CreateStreamFilter... Looks reasonable enough to me ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Feature freeze date for 8.1
On Thu, Apr 28, 2005 at 09:02:40 -0400, Bruce Momjian pgman@candle.pha.pa.us wrote: Well the good news is that there have been almost no Win32 problems, but the other good news is that we are getting a lot of powerful features for 8.1 already: You forgot to list the indexed aggregate feature for max and min. While this isn't that important for experienced postgres users, it is a gotcha for new users. Between this, integrated autovacuum and the cross type index changes in 8.0 we have covered almost all of the newbie gotchas. This should make Postgres effectively equivalent in difficulty with getting started for new users as MySQL. That could significantly boost usage for low end use once word gets out. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Feature freeze date for 8.1
As a user, I would definetly prefer to see 8.1 released sooner with the feature set listed below, than wait another 6+ months for a few other features. Additionally, the beta may go smoother/faster if you don't have too many huge features going in at once. Just my opinion. Later Rob --- Bruce Momjian pgman@candle.pha.pa.us wrote: You might remember that when we released 8.0, the plan was to have a 12-month development cycle for 8.1, unless there were Win32 problems that required complex fixes, in which case we would have a shorter 8.1 cycle. Well the good news is that there have been almost no Win32 problems, but the other good news is that we are getting a lot of powerful features for 8.1 already: o two-phase (Heikki Linnakangas, almost done) o multiple out function paramters (Tom, done) o bitmappted indexes (Tom, almost done) o shared row locks (Alvaro, almost done) o integrated auto-vacuum (Bruce) o buffer cache fixes for SMP (Tom, done) It is possible all these items will be done by sometime in June. Now, if that happens, do we want to continue with the 12-month plan or shorten the 8.1 release cycle, perhaps targeting a release in the September/October timeframe? The current core proposal is to do feature freeze on July 1, with the understanding that we will be done most of the items above by then and have the outstanding patches applied. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
On Thursday 28 April 2005 01:48, Joshua D. Drake wrote: Do companies want to write for Blue Hat PostgreSQL and Suza PostgreSQL because that might be what happens if we don't stay organized? In fact, it might have be happening already. Well that depends... If the companies are writing for Pervasive PostgreSQL I don't think they would have a problem with that ;). And I do agree with you Bruce, it is happening already -- I don't think there is any question in that. ISTM the allure of differentiation and branding is going to be too strong for us to prevent such things. An easy way to differentiate is to add some proprietary/unique extension to the main code and then package that up. If you have to have all your extensions be put into the community version then lose this advantage over your comptetitors. (Mammoth PostgreSQL/Replicator is an example of this) The same holds true for branding if your Pervasive you want to sell Pervasive Postgres rather than PostgreSQL because you get to push your name out there, and get people thinking about your company whenever they talk about the database. I think our goal is to encorage companies to push these changes into the core as much as possible, pointing out things like the advantages community support brings like ongoing maintainance, support in add-on tools like pgadmin or phppgadmin, and eliminating the chance that someone else will submit a similar solution that gets accepted to the community code there by deprecating the work they have already done. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
Quoting Josh Berkus josh@agliodbs.com: Perhaps I can save you some time (yes, I have a degree in Math). If I understand correctly, you're trying extrapolate from the correlation between a tiny sample and a larger sample. Introducing the tiny sample into any decision can only produce a less accurate result than just taking the larger sample on its own; GIGO. Whether they are consistent with one another has no relationship to whether the larger sample correlates with the whole population. You can think of the tiny sample like anecdotal evidence for wonderdrugs. Actually, it's more to characterize how large of a sample we need. For example, if we sample 0.005 of disk pages, and get an estimate, and then sample another 0.005 of disk pages and get an estimate which is not even close to the first estimate, then we have an idea that this is a table which defies analysis based on small samples. Wheras if the two estimates are 1.0 stdev apart, we can have good confidence that the table is easily estimated. Note that this doesn't require progressively larger samples; any two samples would work. We're sort of wandering away from the area where words are a good way to describe the problem. Lacking a common scratchpad to work with, could I suggest you talk to someone you consider has a background in stats, and have them draw for you why this doesn't work? About all you can get out of it is, if the two samples are disjunct by a stddev, yes, you've demonstrated that the union of the two populations has a larger stddev than either of them; but your two stddevs are less info than the stddev of the whole. Breaking your sample into two (or three, or four, ...) arbitrary pieces and looking at their stddevs just doesn't tell you any more than what you start with. -- Dreams come true, not free. -- S.Sondheim, ITW ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
However, there was a lot of coordination that happened with Fujitsu that I don't see happening with the current companies involved. Companies are already duplicating work that is also done by community members or by other companies. That is why we have 80 Linux distributions and a dozen FreeBSD distributions (can I include MacOSX?). I guess more aprropriate comparison would be to distibution specific linux kernels, i.e. RedHat linux kernel vs. suse linux kernel v.s. vanilla or real :) linux kernel. As someone who has been bitten by the RH vs SuSE vs kernel.org kernel several times over the past couple of weeks, we should *really* try to avoid this as much as possible. It's a *major* hassle for the end user if postgresql is no longer the same as postgresql. Not exactly sure what to do to avoid it, though ;-) Other than encourage the companies that develop extensions to submit these to the community distribution and work to get them accepted there... I can see this as an issue but sometimes that community is a hampering course as well. I recognize the community goals and respect them but in some things the community can move really slow. From what I can tell some of this is caused by the no new features rules etc... In business moving slow can mean death to a project. Which is why (hate to beat a dead horse) many OSS projects have moved to 6 month release cycles. Well, it is a two-sided thing. On one hand, businesses usually need new features yesterday, but on the other hand, business would loose most of the benefit of getting the feature fast, if it is not included in the main branch along the road, preferrably in the next official release, because said business would be dependent of the implementor of his specific feature for integrating _all_ other new and desirable fetures of next releas in their specific version of postgres. An example of this might be Powergres - not sure if it was planned to go into community ever, but it must be a pain to maintain the threaded version alongside the main one. I guess that's why it's still based on 7.3... //Magnus ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [JDBC] Statement Timeout and Locking
Kris Jurka [EMAIL PROTECTED] writes: This actually is the problem. It works as three separate statements, but fails as one. The server doesn't seem to recognize the SET when other commands come in before Sync. [ reads some code... ] The problem is that postgres.c only inspects StatementTimeout when start_xact_command starts a transaction command, and the placement of finish_xact_command calls is such that that's not going to happen until after Sync. So the upshot is that the SET statement_timeout isn't going to have effect until after Sync (or after a transaction-control command, but there are none in your example). This suggests that the statement_timeout stuff is being done at the wrong place. I'm not sure exactly what the more-right places would be for V3 protocol though. What exactly would you expect statement_timeout to cover in a Parse/Bind/Execute world --- especially if those aren't issued in a purely sequential fashion? A very simple definition would be that each Parse, Bind, or Execute action is independently constrained by statement_timeout, but that would act significantly differently from the simple-query case if planning takes long enough to be a factor. (Bear in mind that planning can include constant-folding of user-defined functions, so at least in some cases you can imagine people would want statement_timeout to constrain planning.) Also that would imply three times as many timer enable/disable kernel calls, which might be an annoying amount of overhead. Anyway the short-term answer for Markus is don't do it that way. We ought to think about making the backend's behavior more consistent, though. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Feature freeze date for 8.1
Bruce Momjian wrote: You might remember that when we released 8.0, the plan was to have a 12-month development cycle for 8.1, unless there were Win32 problems that required complex fixes, in which case we would have a shorter 8.1 cycle. Well the good news is that there have been almost no Win32 problems, but the other good news is that we are getting a lot of powerful features for 8.1 already: o two-phase (Heikki Linnakangas, almost done) o multiple out function paramters (Tom, done) o bitmappted indexes (Tom, almost done) o shared row locks (Alvaro, almost done) o integrated auto-vacuum (Bruce) o buffer cache fixes for SMP (Tom, done) It is possible all these items will be done by sometime in June. Now, if that happens, do we want to continue with the 12-month plan or shorten the 8.1 release cycle, perhaps targeting a release in the September/October timeframe? The current core proposal is to do feature freeze on July 1, with the understanding that we will be done most of the items above by then and have the outstanding patches applied. It seems to be a good idea to take the chance now to make a release. Delaying the release would mean preventing the wide usage of features although they appear production grade. OTOH, if feature freeze is delayed some seemingly essential features for 8.1 which might arise in the meantime might delay the release further, or induce late feature exclusion when some last minute issues are discovered. Integrated autovacuum seems good enough a reason to release early. Regards, Andreas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
However, there was a lot of coordination that happened with Fujitsu that I don't see happening with the current companies involved. Companies are already duplicating work that is also done by community members or by other companies. That is bound to happen no matter what. Look at plJava and plJ. Some people just feel that their way is better. Some people just don't get along etc... That is why we have 80 Linux distributions and a dozen FreeBSD distributions (can I include MacOSX?). True enough. And coordination, just like other outward-facing activities, is often inconvenient and easy to forget. But it's important. I've just left the Board of Directors of the Web Services Interoperability organization (WS-I). Coordinating the standards activities of IBM, MS, Sun, Oracle, BEA, Fujitsu, SAP, and 130 others takes enormous time and care, but it's the only way coop-etors can function. And having said that, aggressive businesspeople will move too quickly at times, or will hide their activities for business reasons. It's a mostly forgivable sin, IMO. Second, some developers are being hired from the community to work on closed-source additions to PostgreSQL. That is fine and great, but one way to kill PostgreSQL is to hire away its developers. If a commercial company wanted to hurt us, that is certainly one way they might do it. Anyway, it is a concern I have. I am hoping community members hired to do closed-source additions can at least spend some of their time on community work. I would think that most of the developers would stipulate that in order to take the position??? I know Command Prompt would always make sure that the developer could work on the community stuff. The same is true for EnterpriseDB. And finally, we have a few companies working on features that they eventually want merged back into the PostgreSQL codebase. That is a very tricky process and usually goes badly unless the company seeks community involvement from the start, including user interface, implementation, and coding standards. I concur with this. We ran into this with plPerl. The only way to successfully extend PostgreSQL commercially is to coordinate with the community. -- Andy ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [JDBC] Statement Timeout and Locking
Hi, Tom, Tom Lane schrieb: Anyway the short-term answer for Markus is don't do it that way. We ought to think about making the backend's behavior more consistent, though. I'll split the query into three. Having it in one query just was a convenience here. Thanks, Markus ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
First I will comment my original idea. Second I will give another improved suggestion (an idea). I hope, that they will be useful for you. (I don't know, wether the first one was useful at all because it showed, that I and some others of us are not very good with statistics :( ) I haven't looked about the PostgreSQL code, so I don't know, that what is possible now, and what is not. I do know, that the full table scan and after that incremental statistics changes are a very big change, without looking at the code. I meant the following idea: - compare two equal sized samples. Then redo the same thing with double sized samples. So do lots of unnecessary work. Check out the correlation of the two samples to try to guess the distribution. So I tried to give you an idea, not to give you a full answer into the whole problem. I did read some parts of the attached PDFs. They did convince me, that it seems, that the heuristics for the hard cases would actually read almost the whole table in many cases. I did cover the too little sample problem by stating that the user should be able to give the minimum size of samples. This way you would avoid the too small sampling problem. My purpose was not to achieve at most 5% wrong estimates, but to decrease the 2000% wrong estimates, that are seen now sometimes. Conclusions: - No heuristics or similar thing of small samples will grant excellent results. - If you need excellent estimates, you need to process the whole table! - Some special cases, like primary keys and the unique indexes and special case column types do give easy ways to make estimates: For example, wether a boolean column has zero, one or two distinct values, it does not matter so much ??? Hashing seems the right choise for all of them. If I have understund correctly, the full table scans are out of questions for large tables at this time. The percentage idea of taking 10% samples seems good. So here is another suggestion: 1. Do a full percentage scan, starting at an arbitrary position. If the user's data is not homogenous, this hurts it, but this way it is faster. During that scan, try to figure out all those columns, that have at most 100 distinct values. Of course, with it you can't go into 100% accuracy, but if the full table scan is out of question now, it is better, if the accuracy is for example at most ten times wrong. You could also improve accuracy by instead of doing a 10% partial table scan, you could do 20 pieces of 0,5 percent partial table scans: This would improve accuracy a bit, but keep the speed almost the same as the partial table scan. Here are questions for the statisticians for distinct values calculation: If we want at most 1000% tolerance, how big percentage of table's one column must be processed? If we want at most 500% tolerance, how big percentage of table's one column must be processed? If we want at most 250% tolerance, how big percentage of table's one column must be processed? Better to assume, that there are at most 100 distinct values on a table, if it helps calculations. If we try to get as much with one discontinuous partial table scan (0,1-10% sample), here is the information, we can gather: 1. We could gather a histogram for max(100) distinct values for each column for every column. 2. We could measure variance and average, and the number of rows for these 100 distinct values. 3. We could count the number of rows, that didn't match with these 100 distinct values: they were left out from the histogram. 4. We could get a minimum and a maximum value for each column. = We could get exact information about the sample with one 0,1-10% pass for many columns. What you statisticans can gather about these values? My idea is programmatical combined with statistics: + Performance: scan for example 100 blocks each of size 100Mb, because disc I/O is much faster this way. + Enables larger table percentage. I hope it helps with the statistics formula. Required because of more robust statistics: take those blocks at random (not over each other) places to decrease the effect from hitting into statistically bad parts on the table. + Less table scan passes: scan all columns with limited hashing in the first pass. + All easy columns are found here with one pass. +- Harder columns need an own pass each, but we have some preliminary knoledge of them on the given sample after all (minimum and maximum values and the histogram of the 100 distinct values). Marko Ristola Greg Stark wrote: Dave Held [EMAIL PROTECTED] writes: Actually, it's more to characterize how large of a sample we need. For example, if we sample 0.005 of disk pages, and get an estimate, and then sample another 0.005 of disk pages and get an estimate which is not even close to the first estimate, then we have an idea that this is a table which defies analysis based on small samples. I buy that. Better yet is to use the entire sample you've gathered of .01 and
Re: [HACKERS] [COMMITTERS] pgsql: Implement sharable row-level locks, and use them for foreign key
On Thu, Apr 28, 2005 at 06:47:18PM -0300, Tom Lane wrote: Implement sharable row-level locks, and use them for foreign key references to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. One point I didn't quite understand was the business about XLogging heap_lock_tuple. I had to reread your mail to -hackers on this issue several times to get it (as you can see I don't fully grok the WAL rules). Now, I believe that heap_mark4update was wrong on this, no? Only it didn't matter because after a crash nobody cared about the stored Xmax. One nice side effect of this is that the 2PC patch now has this problem solved. The bad part is that locking a tuple emits an (non-XLogFlushed) WAL record and it may have a performance impact. (We should have better performance overall I think, because transactions are no longer locked on foreign key checking.) Anyway: many thanks for updating the patch to an usable state. I'm sorry to have inflicted all those bugs upon you. -- Alvaro Herrera ([EMAIL PROTECTED]) La soledad es compañía ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [proposal] protocol extension to support loadable stream filters
[2005-04-28 10:00] Tom Lane said: | Brent Verner [EMAIL PROTECTED] writes: |Would it be sane to recognize a specific PG_PROTOCOL_MAJOR | to enter the filter-negotiation process? PG_PROTOCOL_MINOR | would then be used to lookup and call a ptr to the filter's | create() in CreateStreamFilter... | | Looks reasonable enough to me ... Now, the hard part...where should this code live? I'm thinking a src/transport directory seems sensible. StreamFilter.[ch] will contain the base StreamFilter along with various utility functions. StreamFilter implementations will reside in their own subdir. src/include/transport/StreamFilter.h src/transport/StreamFilter.c src/transport/zlib/... src/transport/ssl/... Comments/ideas appreciated. cheers. b ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
Hannu, But I too expected the discussion to take place on pgsql-hackers, not some half-hidden mailinglist on pgfoundry. Or at least an announcement of that mailinglist to be made on pgsql-hachers. Yeah, we should announce the mailing list. Actually, I did direct e-mail a bunch of people (including you) about it and invite them to the mailing list. So: http://pgfoundry.org/mail/?group_id=1000107 For discussing potential features, though, I'm personally reluctant at this point to discuss major features until I and my collaborators (example, newsysviews) have a concrete proposal together. It's far too easy to get sidetracked into a discussion of minutia and politics on -hackers; by generating a complete draft spec (at least) on a small mailing list, it's a lot easier to focus on specific goals and schedules, and discussions on hackers around detailed proposals tend to be a lot more focused. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [COMMITTERS] pgsql: Implement sharable row-level locks, and use them for foreign key
Alvaro Herrera [EMAIL PROTECTED] writes: One point I didn't quite understand was the business about XLogging heap_lock_tuple. I had to reread your mail to -hackers on this issue several times to get it (as you can see I don't fully grok the WAL rules). Now, I believe that heap_mark4update was wrong on this, no? Only it didn't matter because after a crash nobody cared about the stored Xmax. Well, actually the reason I decided to put in xlogging there was that I realized it was already broken before. In the existing code it was possible to have this scenario: * transaction N selects-for-update some tuple, so N goes into the tuple's XMAX. * transaction N ends without doing anything else. Since it's not produced any XLOG entries, xact.c thinks it doesn't need to emit either a COMMIT or ABORT xlog record. * therefore, there is no record whatsoever of XID N in XLOG. * bgwriter pushes the dirty data page to disk. * database crashes. * on restart, WAL replay sets the XID counter to N or less, because there is no evidence in the XLOG for N. * now there will be a new transaction N that is mistakenly considered to own an update lock on the tuple. While the negative impact of this situation is probably not high, it's clearly The Wrong Thing. The MultiXactId patch introduces a second way to have the same problem, ie a MultiXactId on disk for which there is no evidence in XLOG, so the MXID might get re-used after restart. In view of the fact that we want to do 2PC sometime soon, and that absolutely requires xlogging every lock, I thought that continuing to try to avoid emitting an xlog record for heap_lock_tuple was just silly. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [pgsql-advocacy] Increased company involvement
Robert Treat wrote: ISTM the allure of differentiation and branding is going to be too strong for us to prevent such things. An easy way to differentiate is to add some proprietary/unique extension to the main code and then package that up. If you have to have all your extensions be put into the community version then lose this advantage over your comptetitors. (Mammoth PostgreSQL/Replicator is an example of this) The same holds true for branding if your Pervasive you want to sell Pervasive Postgres rather than PostgreSQL because you get to push your name out there, and get people thinking about your company whenever they talk about the database. I think our goal is to encorage companies to push these changes into the core as much as possible, pointing out things like the advantages community support brings like ongoing maintainance, support in add-on tools like pgadmin or phppgadmin, and eliminating the chance that someone else will submit a similar solution that gets accepted to the community code there by deprecating the work they have already done. I remember something the president of Great Bridge told me, he said, Great Bridge needs PostgreSQL. If Great Bridge dies, PostgreSQL goes on (as it did), but if PostgreSQL dies, Great Bridge is dead too. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Distinct-Sampling (Gibbons paper) for Postgres
Now, if we can come up with something better than the ARC algorithm ... Tom already did. His clock-sweep patch is already in the 8.1 source. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings