Re: [PATCHES] [HACKERS] Function structure in formatting.c
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Brendan Jurd wrote: > Hello, > > As discussed on -hackers, I've done some refactoring work on > backend/utils/adt/formatting.c, in an attempt to make the code a bit > more intelligible before improving handling of bogus formats. > > This is purely a refactor. The functionality of the file hasn't > changed; it does the same job as before, but it does it in ~200 fewer > lines and ~3.5k fewer characters. The clarity of code is greatly > improved. Sadly, performance appears to be unchanged. > > Summary of changes: > > * Did away with dch_global, dch_date and dch_time. > * Replaced DCH_processor with two new functions DCH_to_char and > DCH_from_char, which now do all the work previously done by > dch_{global,date,time}. > * Removed the 'action' field from the KeyWord struct as it is no longer > useful. > * Changed the type of the 'character' field in the FormatNode struct > to char, because ... that's what it is. The original choice of 'int' > seems to have been an error. > * Removed commented-out function declaration for is_acdc. According > to CVS annotate, this hasn't been in use since sometime in the early > Cretaceous period, and in any case I don't know why you'd want to > check whether a string was the rock band AC/DC. =) > * Reworded some of the comments for clarity. > * Didn't touch any of the number formatting routines. > > This compiles cleanly on x86 gentoo and passes check, installcheck and > installcheck-parallel. > > Thanks for your time, > BJ [ Attachment, skipping... ] > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] default_text_search_config and expression indexes
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > Heikki Linnakangas wrote: > >> Removing the default configuration setting altogether removes the 2nd > >> problem, but that's not good from a usability point of view. And it > >> doesn't solve the general issue, you can still do things like: > >> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ > >> to_tsquery('confB', 'query'); > > > > True, but in that case you are specifically naming different > > configurations, so it is hopefully obvious you have a mismatch. > > There's many more subtle ways to do that. For example, filling a > tsvector column using a DEFAULT clause. But then you sometimes fill it > in the application instead, with a different configuration. Or if one of > the function calls is buried in another user defined function. > > I don't think explicitly naming the configuration gives enough protection. Oh, wow, OK, well in that case the text search API isn't ready and we will have to hold this for 8.4. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] default_text_search_config and expression indexes
Heikki Linnakangas wrote: > Oleg Bartunov wrote: > > On Wed, 8 Aug 2007, Bruce Momjian wrote: > >> Heikki Linnakangas wrote: > >>> If I understood correctly, the basic issue is that a tsvector datum > >>> created using configuration A is incompatible with a tsquery datum > >>> created using configuration B, in the sense that you won't get > >>> reasonable results if you use the tsquery to search the tsvector, or do > >>> ranking or highlighting. If the configurations happen to be similar > >>> enough, it can work, but not in general. > >> > >> Right. > > > > not fair. There are many cases when one can intentionally use different > > configurations. But I agree, this is not for beginners. > > Can you give an example of that? > > I certainly can see the need to use different configurations in one > database, but what's the use case for comparing a tsvector created with > configuration A against a tsquery created with configuration B? I assume you could have a configuration with different stop words or synonymns and compare them. > >>> - using an expression index instead of a tsvector-field, and always > >>> explicitly specifying the configuration, you can avoid that problem (a > >>> query with a different configuration won't use the index). But an > >>> expression index, without explicitly specifying the configuration, will > >>> get corrupted if you change the default configuration. > >> > >> Right. > > > > the same problem if you drop constrain from table (accidently) and then > > gets surprised by select results. > > The difference is that if you change the default configuration, you > won't expect that your queries start to return funny results. It looks > harmless, like changing the date style. If you drop a constraint, it's > much more obvious what the consequences are. > > > We should agree that all you describe is only for DUMMY users. From > > authors point of view I dislike your approach to treat text searching as > > a very limited tool. But I understand that we should preserve people > > from stupid errors. > > > > I want for beginners easy setup and error-prone functionality, > > but leaving experienced users to develop complex search engines. > > Can we have separate safe interface for text searching and explicitly > > recommend it for beginners ? > > I don't see how any of the suggestions limits what you can do with it. > If we remove the default configuration parameter, you just have to be > explicit. If we go with the type-system I suggested, you could still add > casts and conversion functions between different tsvector types, where > it make sense. I don't think the type system is workable given the ability to create new configurations on the fly. I think the configuration must be specified each time. At this point, if we keep discussing the tsearch2 API we are not going to have this in 8.3. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 2D partitioning of VLDB - sane or not?
Jason, > Which brings us back to the original issue. If I decide to stick with > the current implementation and not "improve our existing partitioning > mechanisms to scale to 100,000 partitions", I could do something like: > > Maintain 2 copies of the parent table (partitioned by 256). > Inherit from both to a relation table. > > Does this get me out of the woods with the query analyzer? Doesn't seem > like it would, necessarily, at least. You don't get a table's partitions when you inherit. Just the schema of the master. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Josh Berkus <[EMAIL PROTECTED]> writes: > Tom, >> Also, while we might accept >> a small hook-function patch for 8.3, there's zero chance of any of that >> other stuff making it into this release cycle. > I don't think anyone was thinking about 8.3. This is pretty much 8.4 > stuff; Julius is just raising it now becuase they don't want to go down > the wrong path and waste everyone's time. Well, if they get the hook in now, then in six months or so when they have something to play with, people would be able to play with it. If not, there'll be zero uptake till after 8.4 is released... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Wrote a connect-by feature
On 8/14/07, Bertram Scharpf <[EMAIL PROTECTED]> wrote: > I just forgot to refer to--of course--the German knight of > the iron hand. Prima donna? Not quite done yet? Denk mal nach. Kritik kann wohl begruendet sein. Und was die Jungs gesagt haben war sowohl inhaltlich als auch von Votrag her vollkommen angemessen. Oh well. Pity. I thought it was an interesting contribution. > Bertram Cheers, Andrej ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Tom, > Also, while we might accept > a small hook-function patch for 8.3, there's zero chance of any of that > other stuff making it into this release cycle. I don't think anyone was thinking about 8.3. This is pretty much 8.4 stuff; Julius is just raising it now becuase they don't want to go down the wrong path and waste everyone's time. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Autovacuum and toast tables
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I think there is some special code in tqual.c that skips some checks > assuming that the toast table is vacuumed at the same time as the main > heap. I don't believe there is any such connection. We do use a special snapshot for toast tables, but it's only needed to make sure VACUUM FULL on a toast table will work (ie, we have to respect MOVED_IN/MOVED_OFF). > We go certain lengths in autovacuum to make sure tables are vacuumed > when their toast table needs vacuuming and the main table does not, > which is all quite kludgy. Yeah --- getting rid of that mess would be a plus. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Autovacuum and toast tables
Gregory Stark wrote: > When autovacuum vacuums a table it always vacuums the toast table as well. Is > there any particular reason to consider the two together? I think this may > just be a holdover from the interactive command which considers them together > because the toast table is supposed to be invisible to users. > > But autovacuum could look at the stats for the toast tables and make decisions > about them independently from the main tables, no? > > The reason I ask is because looking in the logs from benchmarks I see lots of > useless vacuuming of empty toast tables which have no dead tuples. Now in this > case it doesn't cost anything because they're empty. But I could easily see > situations where the toast tables could be quite large but not be receiving > any updates when the main table is receiving a large volume of updates on > other columns. I think there is some special code in tqual.c that skips some checks assuming that the toast table is vacuumed at the same time as the main heap. I am not sure how vital is that. Maybe it can be fixed, or maybe it is not a problem at all. We go certain lengths in autovacuum to make sure tables are vacuumed when their toast table needs vacuuming and the main table does not, which is all quite kludgy. So we already look at their stats and make decisions about them. But what we do after that is force a vacuum to the main table, even if that one does not need any vacuuming, which is dumb. We could certainly fix that, mainly pending analysis of the above problem. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal: Pluggable Optimizer Interface
Julius Stroffek <[EMAIL PROTECTED]> writes: > I understood that if the user creates his own implementation of the > planner which can be stored in some external library, he have to provide > some C language function as a "hook activator" which will assign the > desired value to the planner_hook variable. Both, the activator function > and the new planner implementation have to be located in the same > dynamic library which will be loaded when CREATE FUNCTION statement > would be used on "hook activator" function. You could do it that way if you wanted, but a minimalistic solution is just to install the hook from the _PG_init function of a loadable library, and then LOAD is sufficient for a user to execute the thing. There's a small example at http://archives.postgresql.org/pgsql-patches/2007-05/msg00421.php Also, having the loadable module add a custom GUC variable would likely be a preferable solution for control purposes than making specialized functions. I attach another small hack I made recently, which simply scales all the planner's relation size estimates by a scale_factor GUC; this is handy for investigating how a plan will change with relation size, without having to actually create gigabytes of test data. > There are more things in the proposal as a new pg_optimizer catalog and > different way of configuring the hooks. However, this thinks are not > mandatory for the functionality but are more user friendly. Granted, but at this point we are talking about infrastructure for planner-hackers to play with, not something that's intended to be a long-term API for end users. It may or may not happen that we ever need a user API for this at all. I think a planner that just "does the right thing" is far preferable to one with a lot of knobs that users have to know how to twiddle, so I see this more as scaffolding on which someone can build and test the replacement for GEQO; which ultimately would go in without any user-visible API additions. regards, tom lane #include "postgres.h" #include "fmgr.h" #include "commands/explain.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" #include "utils/guc.h" PG_MODULE_MAGIC; void_PG_init(void); void_PG_fini(void); static double scale_factor = 1.0; static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, RelOptInfo *rel); /* * Get control during planner's get_relation_info() function, which sets up * a RelOptInfo struct based on the system catalog contents. We can modify * the struct contents to cause the planner to work with a hypothetical * situation rather than what's actually in the catalogs. * * This simplistic example just scales all relation size estimates by a * user-settable factor. */ static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, RelOptInfo *rel) { ListCell *ilist; /* Do nothing for an inheritance parent RelOptInfo */ if (inhparent) return; rel->pages = (BlockNumber) ceil(rel->pages * scale_factor); rel->tuples = ceil(rel->tuples * scale_factor); foreach(ilist, rel->indexlist) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ilist); ind->pages = (BlockNumber) ceil(ind->pages * scale_factor); ind->tuples = ceil(ind->tuples * scale_factor); } } /* * _pg_init() - library load-time initialization * * DO NOT make this static nor change its name! */ void _PG_init(void) { /* Get into the hooks we need to be in all the time */ get_relation_info_hook = my_get_relation_info; /* Make scale_factor accessible through GUC */ DefineCustomRealVariable("scale_factor", "", "", &scale_factor, 0.0001, 1e9, PGC_USERSET, NULL, NULL); } /* * _PG_fini() - library unload-time finalization * * DO NOT make this static nor change its name! */ void _PG_fini(void) { /* Get out of all the hooks (just to be sure) */ get_relation_info_hook = NULL; } ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: Pluggable Optimizer Interface
Stefan, thanks for pointing this out. I missed this change. We would like to place the hooks to a different place in the planner and we would like to just replace the non-deterministic algorithm searching for the best order of joins and keep the rest of the planner untouched. I am not quite sure about the usage from the user point of view of what got implemented. I read just the code of the patch. Are there more explanations somewhere else? I understood that if the user creates his own implementation of the planner which can be stored in some external library, he have to provide some C language function as a "hook activator" which will assign the desired value to the planner_hook variable. Both, the activator function and the new planner implementation have to be located in the same dynamic library which will be loaded when CREATE FUNCTION statement would be used on "hook activator" function. Am I correct? Have I missed something? If the above is the case than it is exactly what we wanted except we would like to have the hook also in the different place. There are more things in the proposal as a new pg_optimizer catalog and different way of configuring the hooks. However, this thinks are not mandatory for the functionality but are more user friendly. Thanks Julo Stefan Kaltenbrunner wrote: Julius Stroffek wrote: Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Stefan
Re: [HACKERS] Proposal: Pluggable Optimizer Interface
Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes: > Julius Stroffek wrote: >> There is a proposal attached to this mail about the interface >> we would like to implement for switching between different >> optimizers. Please review it and provide a feedback to us. > hmm - how does is that proposal different from what got implemented with: > http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Well, it's a very different level of abstraction. The planner_hook would allow you to replace the *entire* planner, but if you only want to replace GEQO (that is, only substitute some other heuristics for partial search of a large join-order space), doing it from planner_hook will probably require duplicating a great deal of code. A hook right at the place where we currently choose "geqo or regular" would be a lot easier to experiment with. Replacing GEQO sounds like a fine area for investigation to me; I've always been dubious about whether it's doing a good job. But I'd prefer a simple hook function pointer designed in the same style as planner_hook (ie, intended to be overridden by a loadable module). The proposed addition of a system catalog and SQL-level management commands sounds like a great way to waste a lot of effort on mere decoration, before ever getting to the point of being able to demonstrate that there's any value in it. Also, while we might accept a small hook-function patch for 8.3, there's zero chance of any of that other stuff making it into this release cycle. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Testing the async-commit patch
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: I have some ideas about testing configuration items. Doing all our tests with the default config is not ideal, I think. Essentially we'd put up a server that would have sets of . The client would connect to the server if it could and get the set(s) of lines for the branch on question, and for each set it would try another run of installcheck (I'm also wondering if we should switch to doing installcheck-parallel). Anyway, this would be a config option on the buildfarm, so we wouldn't overburden hosts with limited run windows (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. some of the old and/or tiny hardware we have). If this seems worth it I'll put it on my TODO. Sounds like a good plan, except that an extra server seems unnecessary mechanism (and perhaps an unnecessary security risk). We can just put a file into CVS src/test/regress showing what we'd like tested. That could work. Let's say that this file looks just like a postgresql.conf file, except that any line beginning with '[' is a config set name for the lines that follow. So we might have: [asynch_commit] synchronous_commit = off [no_fsync] fsync = off [csvlogs] start_log_collector = true log_destination = 'stderr, csvlog' Then there would be an extra installcheck-parallel run for each set. If the file isn't there we do nothing. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal: Pluggable Optimizer Interface
Julius Stroffek wrote: > Hi All, > > Tomas Kovarik and I have presented at PGCon 2007 in Ottawa > the ideas about other possible optimizer algorithms to be used > in PostgreSQL. > > We are quite new to PostgreSQL project so it took us some > time to go through the sources end explore the possibilities > how things could be implemented. > > There is a proposal attached to this mail about the interface > we would like to implement for switching between different > optimizers. Please review it and provide a feedback to us. > Thank You. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Proposal: Pluggable Optimizer Interface
Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. Regards Julius Stroffek Proposal for Pluggable Optimizer Interface == Overview We have presented at PGCon 2007 in Ottawa couple of other approaches and algorithms that can be used for query optimization, see http://www.pgcon.org/2008/papers/Execution_Plan_Optimization_Techniques_Julius_Stroffek.pdf We have focused on algorithms for searching the space of all possible orders of joins including bushy trees. The possible algorithms include * Dynamic Programming (already implemented in PostgreSQL) * Genetic Algorithm (already implemented in PostgreSQL) * Dijkstra's Algorithm * A* Search Algorithm * Greedy âNearest Neighborâ Algorithm * Hill-Climbing * Simulated Annealing * Iterative Improvement * Two Phase Optimization * Toured Simulated Annealing Choosing the best algorithm from the above list is difficult. We have to consider the length of the optimizer computation vs. the quality of the solution which we would like to achieve. We may want to do some hybrid optimization â run couple of the algorithms from the above and choose the best solution found. We might run some very fast algorithm at the beginning and depending on the solution cost we may decide whether it is worthwhile to try to optimize the plan even more (using other algorithm with longer running time but producing better solutions). Therefore we would like to propose an interface which can be used to switch between different optimizer algorithms and/or allow a user to write and use his own implementation. It would allow the community to ship more optimizer algorithms as contrib modules and users may then decide which of those algorithms should be used for their queries. Creating an optimizer = We would propose to create a catalog holding the available optimizers in the system called pg_optimizer. Users could than use a SET command to switch between different optimizers. postgres=# select * from pg_optimizer; optname | optproc -+--- geqo| geqo_optimizer dynamic | dynamic_optimizer greedy | greedy_optimizer (4 rows) postgres=# set optimizer=greedy; SET Optimizer Invocation Point == There is a code in function make_rel_from_joinlist which decides whether to invoke dynamic programming or genetic algorithm for query optimization. We would propose to place the invocation of the plugged optimizer to the same place and with the same parameters as function geqo and make_one_rel_by_joins are currently invoked. Creating and dropping an optimizer == The optimizer function have to be implemented as a C-Language Function using âVersion 1 Calling Conventionsâ. The return type of the function is RelOptInfo * and the arguments passed to the function are 1.PlannerInfo *root 2.int levels_needed 3.List * initial_rels The proper âCREATE FUNCTIONâ statement have to be used to create the optimizer function. > CREATE FUNCTION greedyoptimizer(internal, int, internal) > RETURNS internal > AS 'mylib', 'greedy_optimizer' > LANGUAGE C > ; Once, the optimizer function is created user may create an optimizer using the function with the statement > CREATE OPTIMIZER greedy ( > function = greedyoptimizer > comment = 'Greedy Nearest Neighbor Optimizer' > ); If the user decides not to use the optimizer anymore he can invoke > DROP OPTIMIZER greedy; User have to also drop the optimizer function with > DROP FUNCTION greedyoptimizer; Project TODO List = 1.Create a pg_optimizer catalog to hold available optimizers. 2.Create wrappers above the current dynamic programming and genetic algorithm optimizers to be used to call those implementations. 3.Modify the parser and add the functions to handle and execute the CREATE/DROP OPTIMIZER statements. 4.Modify GUC that it would be possible to switch between optimizers. 5.Change the code at the optimizer invocation point that the appropriate optimizer function would be called. 6.Handle object dependencies â make an entry in pg_depend that optimizer depends on its function. 7.Implement '\dO' command that will list the available optimizers. 8.Create a contrib module and ship some other optimizer algorithms. 9.Any other suggestion, comments and changes that will come out from the review of this proposal. Things to Decide 1.Rights. Who can create/drop optimizers? Who
Re: [HACKERS] Testing the async-commit patch
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> I propose that we actually set synchronous_commit >> off by default for the next little while --- at least up to 8.3beta1, >> maybe until we approach the RC point. That will ensure that every >> buildfarm machine is exercising the async-commit behavior, as well as >> every developer who's testing HEAD. >> >> Of course the risk is that we might forget to turn it back on before >> release :-( > I'll set a cron job to remind us. What date should I set it for? :) I thought of a low-tech solution for that: put a note in src/tools/RELEASE_CHANGES about it. We invariably look at that file while preparing releases. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Testing the async-commit patch
Andrew Dunstan <[EMAIL PROTECTED]> writes: > I have some ideas about testing configuration items. Doing all our tests > with the default config is not ideal, I think. Essentially we'd put up a > server that would have sets of . The > client would connect to the server if it could and get the set(s) of > lines for the branch on question, and for each set it would try another > run of installcheck (I'm also wondering if we should switch to doing > installcheck-parallel). Anyway, this would be a config option on the > buildfarm, so we wouldn't overburden hosts with limited run windows > (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. > some of the old and/or tiny hardware we have). > If this seems worth it I'll put it on my TODO. Sounds like a good plan, except that an extra server seems unnecessary mechanism (and perhaps an unnecessary security risk). We can just put a file into CVS src/test/regress showing what we'd like tested. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Testing the async-commit patch
"Tom Lane" <[EMAIL PROTECTED]> writes: > But to get to the point: the urgency of testing the patch more > extensively has just moved up a full order of magnitude, IMHO anyway. > I muttered something in the other thread about providing a buildfarm > option to run the regression tests with synchronous_commit off. That > would still be a good idea in the long run, but I want to take some more > drastic measures now. I propose that we actually set synchronous_commit > off by default for the next little while --- at least up to 8.3beta1, > maybe until we approach the RC point. That will ensure that every > buildfarm machine is exercising the async-commit behavior, as well as > every developer who's testing HEAD. > > Of course the risk is that we might forget to turn it back on before > release :-( I'll set a cron job to remind us. What date should I set it for? :) Seems like a fine plan to me. It's supposed to be 100% reliable and have indistinguishable behaviour barring a system crash and nobody should be running production data on a beta or pre-beta build, so they should never see a difference. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Testing the async-commit patch
Tom Lane wrote: But to get to the point: the urgency of testing the patch more extensively has just moved up a full order of magnitude, IMHO anyway. I muttered something in the other thread about providing a buildfarm option to run the regression tests with synchronous_commit off. That would still be a good idea in the long run, but I want to take some more drastic measures now. I propose that we actually set synchronous_commit off by default for the next little while --- at least up to 8.3beta1, maybe until we approach the RC point. That will ensure that every buildfarm machine is exercising the async-commit behavior, as well as every developer who's testing HEAD. Of course the risk is that we might forget to turn it back on before release :-( Turn it off, I doubt we'll forget. I have some ideas about testing configuration items. Doing all our tests with the default config is not ideal, I think. Essentially we'd put up a server that would have sets of . The client would connect to the server if it could and get the set(s) of lines for the branch on question, and for each set it would try another run of installcheck (I'm also wondering if we should switch to doing installcheck-parallel). Anyway, this would be a config option on the buildfarm, so we wouldn't overburden hosts with limited run windows (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. some of the old and/or tiny hardware we have). If this seems worth it I'll put it on my TODO. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2D partitioning of VLDB - sane or not?
Josh, I think what you are suggesting is something like this: -- begin SQL -- core=# CREATE TABLE temp_x( x_id BIGINT PRIMARY KEY, x_info VARCHAR(16) NOT NULL DEFAULT 'x_info'); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "temp_x_pkey" for table "temp_x" CREATE TABLE core=# CREATE TABLE temp_y( y_id BIGINT PRIMARY KEY, y_info VARCHAR(16) NOT NULL DEFAULT 'y_info'); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "temp_y_pkey" for table "temp_y" CREATE TABLE core=# CREATE TABLE temp_xy() INHERITS (temp_x, temp_y); CREATE TABLE core=# \d temp_xy Table "core.temp_xy" Column | Type | Modifiers +---+-- x_id | bigint| not null x_info | character varying(16) | not null default 'x_info'::character varying y_id | bigint| not null y_info | character varying(16) | not null default 'y_info'::character varying Inherits: temp_x, temp_y -- end code -- The problem with this is what I really want to do is something like this: -- begin code -- core=# CREATE TABLE temp_xx() INHERITS (temp_x, temp_x); ERROR: inherited relation "temp_x" duplicated -- end code -- The issue is that the relations are in fact reflexive and, due to the sheer size fo the data I'm trying to warehouse, I'd like not to keep them around more than once. I'm sort of thinking aloud here, but based on what you've told me, I guess I'm left having to choose which direction I want to search in since the domains and ranges are theoretically the same. On the other hand, perhaps I could take the overhead impact and just keep two copies of the parent tables around. The relation table is on the order of about 300-500x as large as the parent tables and that multiplier is expected to stay relatively constant over time...? Which brings us back to the original issue. If I decide to stick with the current implementation and not "improve our existing partitioning mechanisms to scale to 100,000 partitions", I could do something like: Maintain 2 copies of the parent table (partitioned by 256). Inherit from both to a relation table. Does this get me out of the woods with the query analyzer? Doesn't seem like it would, necessarily, at least. On 8/11/07, Josh Berkus <[EMAIL PROTECTED]> wrote: > > Jason, > > > Aside from running into a known bug with "too many triggers" when > creating > > gratuitous indices on these tables, I feel as it may be possible to do > what > > I want without breaking everything. But then again, am I taking too many > > liberties with technology that maybe didn't have use cases like this one > in > > mind? > > Well, you're pushing PostgreSQL partitioning further than it's currently > able > to go. Right now our range exclusion becomes too costly to be useful > somewhere around 300 to 1000 partitions (depending on CPU and other > issues) > because the constraints are checked linearly. > > To make your scheme work, you'd need to improve our existing partitioning > mechanisms to scale to 100,000 partitions. It would also help you to > implement multiple inheritance so that you could have a partition which > belonged to two masters. I'd be very interested in seeing you do so, of > course, but this may be more hacking than you had in mind. > > -- > Josh Berkus > PostgreSQL @ Sun > San Francisco > -- Jason Nerothin Programmer/Analyst IV - Database Administration UCLA-DOE Institute for Genomics & Proteomics Howard Hughes Medical Institute 611 C.E. Young Drive East | Tel: (310) 206-3907 105 Boyer Hall, Box 951570 | Fax: (310) 206-3914 Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED] http://www.mbi.ucla.edu/~jason
Re: [HACKERS] Wrote a connect-by feature
Bertram, Just so you know, there is a site for external projects associated with PostgreSQL which are not part of the core code: www.pgfoundry.org You are quite welcome to propose your connect-by code as a project on pgFoundry. This is how many new features start out before acceptance anyway. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Testing the async-commit patch
So I was testing my fix for the problem noted here: http://archives.postgresql.org/pgsql-hackers/2007-08/msg00196.php and promptly found *another* bug. To wit, that repair_frag calls HeapTupleSatisfiesVacuum without bothering to acquire any buffer content lock. This results in an Assert failure inside SetBufferCommitInfoNeedsSave, if HeapTupleSatisfiesVacuum tries to update any hint bits for the tuple. I think that is impossible in current releases, because the tuple's logical status was fully determined by the prior call in scan_heap. But it's possible as of 8.3 because the walwriter or other backends could have moved the WAL flush point, allowing a previously unhintable XMAX to become hintable. I think the best solution for this is to acquire the buffer content lock before calling HeapTupleSatisfiesVacuum --- it's really a pretty ugly shortcut that the code didn't do that already. We could alternatively refuse to do shrinking unless both XMIN and XMAX are correctly hinted at scan_heap time; but there is not anything else in vacuum.c that seems to require XMAX_COMMITTED to be set, so I'd rather not make that restriction. But to get to the point: the urgency of testing the patch more extensively has just moved up a full order of magnitude, IMHO anyway. I muttered something in the other thread about providing a buildfarm option to run the regression tests with synchronous_commit off. That would still be a good idea in the long run, but I want to take some more drastic measures now. I propose that we actually set synchronous_commit off by default for the next little while --- at least up to 8.3beta1, maybe until we approach the RC point. That will ensure that every buildfarm machine is exercising the async-commit behavior, as well as every developer who's testing HEAD. Of course the risk is that we might forget to turn it back on before release :-( Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] strange behaviour of parser - numeric domain doesn't work phantom
2007/8/13, Tom Lane <[EMAIL PROTECTED]>: > "Pavel Stehule" <[EMAIL PROTECTED]> writes: > > postgres=# select -11::nb; > > The operator precedence table at > http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-PRECEDENCE > says perfectly clearly that :: binds more tightly than -. > > regards, tom lane > My mistake. Sorry for noise regards Pavel Stehule ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Wrote a connect-by feature
Am Sonntag, 12. Aug 2007, 18:36:47 +0200 schrieb Bertram Scharpf: > Good bye! I just forgot to refer to--of course--the German knight of the iron hand. Bertram -- Bertram Scharpf Stuttgart, Deutschland/Germany http://www.bertram-scharpf.de ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 92DE-7B78-4153 : CONFIRM from pgsql-hackers (subscribe)
accept 92DE-7B78-4153 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.3 freeze/release
Michael Meskes <[EMAIL PROTECTED]> writes: > it seems a lot of work is still being done on 8.3. But we are in feature > freeze, aren't we? I'm wondering because I'm sitting on some major > changes to ecpg (real prepare amongst them) but haven't committed these > because of the freeze. Well, we are in feature freeze, but as far as I'm concerned ecpg is its own little fiefdom. If you have enough confidence in these changes to apply them now, no one is going to question you. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] strange behaviour of parser - numeric domain doesn't work phantom
"Pavel Stehule" <[EMAIL PROTECTED]> writes: > postgres=# select -11::nb; The operator precedence table at http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-PRECEDENCE says perfectly clearly that :: binds more tightly than -. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Maximum row size
Gregory Stark <[EMAIL PROTECTED]> writes: > ... it's kind > of pie-in-the-sky since it depends on every column being toasted (and not > compressed). It's much more normal to have some columns be quite large and > others more normal sized or different data types. Yeah. I think those numbers should simply be removed from the discussion entirely, because they are bogus. There is no real-world situation where someone is going to put 1Gb into each and every field of a row. The limits that should be stated are 1Gb per column and 250-1600 columns per table --- let the readers draw their own conclusions from those. As Greg notes, the 400-column figure came from presuming that every column gets toasted out-of-line; but that's not totally realistic either. Furthermore, we have some untoastable types that are wider than a toast pointer. The worst case seems to be 32 bytes for box, line, and lseg (I ignore "name" which is stated not to be intended for general-purpose use). If you made a table of only "box" columns you could get at most a bit over 250 of them on a standard-size page. So I think that's what we should quote as the lower bound. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] 8.3 freeze/release
Hi, it seems a lot of work is still being done on 8.3. But we are in feature freeze, aren't we? I'm wondering because I'm sitting on some major changes to ecpg (real prepare amongst them) but haven't committed these because of the freeze. However, it looks as if new patches are still coming in. But then I might have gotten a false impression. Anyway, if the release is still far enough away, I would like to commit my changes too. Comments? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Maximum row size
Devrim GÜNDÜZ <[EMAIL PROTECTED]> writes: > What is the maximum row size for PostgreSQL? > > http://www.postgresql.org/about/ says 1.6 TB I think this comes from the maximum of 1600 columns times the maximum of 1GB per (toasted) datum. However 1600 toasted datums won't fit on an 8k page. Each toast pointer is 17 bytes so only 480 toast pointers will fit on a 8k page. Which may be the rationale for this? > http://www.postgresql.org/docs/faqs.FAQ.html says 400 GB. Before packed varlenas We used to be able to fit only 408 minus a few for the page header. Perhaps it was 400 then, or perhaps they were just rounding down. So with packed varlenas we can raise the second number to 480GB. But it's kind of pie-in-the-sky since it depends on every column being toasted (and not compressed). It's much more normal to have some columns be quite large and others more normal sized or different data types. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] strange behaviour of parser - numeric domain doesn't work phantom
Hello I am testing: postgres=# create domain nb int check(value > 10); CREATE DOMAIN postgres=# select 11::nb; nb 11 (1 row) -- ok postgres=# select -11::nb; ?column? -- -11 (1 row) -- ughh postgres=# select (-11)::nb; -- I understand ERROR: value for domain nb violates check constraint "nb_check" But nobody will write parentheses with alone negative number Regards Pavel Stehule postgres=# select version(); version --- PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12) (1 row) ---(end of broadcast)--- TIP 6: explain analyze is your friend