Re: [HACKERS] Security and Data Protection Issues
Thank you, I'm also curious as to whether the data folder is already in some way encrypted and if so, what encryption/obfuscation is being used. There doesn't seem to be anything about this on the web.
Re: [HACKERS] Security and Data Protection Issues
Stuart Gundry wrote: Thank you, I'm also curious as to whether the data folder is already in some way encrypted and if so, what encryption/obfuscation is being used. There doesn't seem to be anything about this on the web. No encryption, although large text fields may be compressed (read up on TOAST) so not readable as plain-text. -- Richard Huxton Archonet Ltd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-explain patch
On Wed, 2008-07-09 at 09:11 +, Dean Rasheed wrote: Simon, I like your proposal, and I think I can see how to code it fairly easily. There is one thing that it doesn't allow, however, which the debug_xxx parameters do, and that is for a non-superuser to trace SQL used in functions, from an interactive client session. For me, this is quite a big thing, because I find it most convienient to turn these parameters on while writing and tweaking stored procedures, and have the output go straight to my psql window, without having to connect as a superuser and trawl through log files. So I suggest grouping these parameters in their own category (eg. sql_trace) and then having additional parameters to control where the output would go. So the sql_trace parameters would be: * sql_trace_min_planner_duration * sql_trace_min_executor_duration * sql_trace_explain_plan and they would work exactly as you describe, except they would be settable by normal users. Then the destination(s) for the statement and EXPLAIN logging would be controlled by: * client_sql_trace = on | off - settable by a normal user to allow a client session to see the sql_trace output. If this parameter is on, the sql_trace will be logged as NOTICE output. After sleeping on this, I think we should follow your idea. If its possible to do the sql_trace_* parameters as a single one, I would prefer it, since it makes it more practical to use dynamically. Not sure how...maybe with a wrapper function? sql_trace(integer) sets just sql_trace_min_executor_duration sql_trace(integer, boolean) sets executor and explain sql_trace(integer, integer, boolean) sets all 3 I think you probably need to drop the sql_ off the front because of parameter length only. No need for the other log_... parameter though. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-explain patch
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2008-07-09 at 09:11 +, Dean Rasheed wrote: So I suggest grouping these parameters in their own category (eg. sql_trace) and then having additional parameters to control where the output would go. So the sql_trace parameters would be: * sql_trace_min_planner_duration * sql_trace_min_executor_duration * sql_trace_explain_plan If its possible to do the sql_trace_* parameters as a single one, I would prefer it, since it makes it more practical to use dynamically. Not sure how...maybe with a wrapper function? sql_trace(integer) sets just sql_trace_min_executor_duration sql_trace(integer, boolean) sets executor and explain sql_trace(integer, integer, boolean) sets all 3 Fwiw it seems to me trace_sql_* would be nicer, much as we have track_* guc parameters. Though I also wonder if there's really any distinction here between tracing and logging like log_explain_plan and so on. Perhaps we should keep the word trace for a more in-detail facility. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Security and Data Protection Issues
Stuart Gundry wrote: Been looking into truecrypt but can't seem to get it to play nice with postgres silent installer. When I try to set the BASEDIR=M:\, which is where I mounted my encrypted volume it gives the following error in the log The Cacls command can be run only on disk drives that use the NTFS file system. Hmm, and are sure that the encrypted partition is seen by the system as a NTFS partition? It sounded like you've done this before so I was hoping you could give me some pointers. I know its not the rest of my install command since I've used that many times before. I've done that, but not on Windows, so I'm not really sure how that'll work. Maybe someone with more Windows experience could help here? BTW: you could try and install Postgres as usual, and just keep your WAL logs directory and all data from the tables on the encrypted partition. Just read the documentation on CREATE TABLESPACE and about moving the pg_xlog directory. Thank you for your time You're welcome ;) Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
At 2008-07-09 17:06:19 -0700, [EMAIL PROTECTED] wrote: I'm really new to this git thing, but I now have access to create git-shell accounts, etc. on git.postgresql.org. Any ideas you can offer on how better to handle this would really help me. :) The question is: what is your objective in providing this repository? I've only just cloned your repository, so I can only guess at how it is managed, but you seem to be rebasing your changes on top of the current Postgres source and responding to upstream changes by throwing away the old patches and applying the new ones. (By the way, your origin/master appears to be lagging the current HEAD by 71 commits, i.e. a month.) That has several problems: - There is no indication of how the WITH RECURSIVE patches have changed over time or in response to feedback. For example, the bugs recently fixed are indistinguishable from earlier changes. This would be very valuable information to have during review (and that's really what I was expecting when I cloned). - One has to clone a 250MB repository (over HTTP, with almost no progress indication) to see what is essentially exactly the same as the posted patch. - Rebasing isn't appropriate for a public branch, since you're rewriting history that people have pulled already. If your objective is only to make an up-to-date patch always available, then it is unnecessary to publicise your repository. You could just use git-rebase to stay abreast of significant changes in origin/master and run git-format-patch to generate a patch... but then you still end up with essentially the same thing that Tatsuo Ishii posted to the list the other day anyway. I agree with Alvaro. If the developers aren't committing to this repository that the patches are generated from, there's really no point to using the repository for review. It's very much simpler to just read the patch as posted to the list. The only real benefit to review that I can imagine would be if full change history were available, which it could do if a) changes were committed separately with proper comments and b) if the branch were *NOT* rebased, but instead periodically merged with origin/master. That way I could pull from the repository and run e.g. git-log --stat origin/master..with-recursive or similar. Hope this helps. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
Abhijit Menon-Sen [EMAIL PROTECTED] writes: The only real benefit to review that I can imagine would be if full change history were available, which it could do if a) changes were committed separately with proper comments and b) if the branch were *NOT* rebased, but instead periodically merged with origin/master. That way I could pull from the repository and run e.g. git-log --stat origin/master..with-recursive or similar. Additionally if other people could commit change patches to the repository or submit patches which upstream could apply then git would be able to update submitters trees with just the patches they're missing (ie, skipping the patches they submitted upstream or merging them cleanly) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Follow-up on replication hooks for PostgreSQL
On 7/10/08, Robert Hodges [EMAIL PROTECTED] wrote: This is a quick update on a promise I made early in June to suggest requirements as well as ways to add replication hooks that would support logical replication, as opposed to the physical replication work currently underway based on NTT's code. Well, June was a pretty busy month, so it has taken a while to get back to this. However, we are now beginning to examine options for PostgreSQL logical replication. To make a long story short we are willing to commit resources to this problem or fund other people to do it for us. If you are interested please contact me directly. Meanwhile, we are quite serious about this problem and intend to work on helpful additions to PostgreSQL in this area. I will post more as we make progress. Well, I'm not exactly sure what you are planning. It's OK to do draft design privately, but before actually starting coding, the design should be discussed in -hackers. And I'm not exactly sure what you mean in logical replication? Way to log DDL statements? Do you want to log DML also? FWIW, here's very draft design for functionality that could be used to make current Slony-I/Londiste-like solutions to replicate DDL also. 1. CREATE DDL TRIGGER statement that allows to call function for all DDL statements. Only filtering that makes sense here is filtering by area: tables/functions/views/etc. It must be possible to do AFTER trigger. Whether BEFORE trigger for DDL make sense or not, I'm not sure. 2. When function is called, following information is given: - Object type the event was for (table/view/function) - Array of object names. - SQL statement as text. The trigger function can filter further based on object names whether it does want to log the event or not. Trying to make the trigger run on only subset of events is complex, and parsing the SQL to pieces for trigger to understand it better is also complex and neither is needed. Unless there are some common situation where such simple design fails to work, I would not make the scheme more complex. Also the design should be based on assumption that the target side is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE. We should not make scheme more complex to survive cases where target is not in sync. That way madness lies. The effect should be like same SQL statements are applied to target by hand, no more, no less. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] initdb in current cvs head broken?
I am trying to generate a patch with respect to the current CVS head. So ai rsynced the tree, then did cvs up and installed the db. However, when I did initdb on a data directory it is stuck: It is stuck after printing creating template1 creating template1 database in /home/postgres/data/base/1 ... I did strace $ strace -p 9852 Process 9852 attached - interrupt to quit waitpid(9864, then I straced 9864 $ strace -p 9864 Process 9864 attached - interrupt to quit semop(8060958, 0xbff36fee, $ ps aux|grep 9864 postgres 9864 1.5 1.3 37296 6816 pts/1S+ 07:51 0:02 /usr/local/pgsql/bin/postgres --boot -x1 -F Seems like a bug to me. Is the tree stable only after commit fests and I should not use the unstable tree for generating patches? Thanks, -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-explain patch
After sleeping on this, I think we should follow your idea. Hmm. I preferred your idea ;-) It reduces the number of new parameters back down to 3, which makes it easier to use: * trace_min_planner_duration - int, PGC_USERSET * trace_min_executor_duration - int, PGC_USERSET * trace_explain_plan - bool, PGC_USERSET (I already decided to drop the sql_) with all output going to new level TRACE as you described. So output goes to client and not server log by default as soon as either of the first 2 parameters is enabled. I've attached what I've done so far, which works according to the above description. I've not done much testing or written any docs yet, It would be good to know if this is along the right lines. Changing parameter names is easy, although admittedly very important to get right. You didn't say why you changed your mind on this? I'm more concerned about any possible security holes it opens up. For SQL in SQL functions, it just gives the function name and statement number. For EXECUTEd queries, it doesn't have access to the SQL, so it just logs the other stuff and any context info. For plpgsql functions it will log values which appear as literals in any queries, but actually debug_print_parse exposes far more info in this case. _ 100’s of Nikon cameras to be won with Live Search http://clk.atdmt.com/UKM/go/101719808/direct/01/*** ./src/backend/commands/copy.c.orig 2008-07-09 16:14:54.0 +0100 --- ./src/backend/commands/copy.c 2008-07-09 16:15:54.0 +0100 *** *** 1042,1047 --- 1042,1048 /* plan the query */ plan = planner(query, 0, NULL); + plan-query_string = queryString; /* * Use a snapshot with an updated command ID to ensure this query sees *** ./src/backend/commands/explain.c.orig 2008-07-08 13:55:29.0 +0100 --- ./src/backend/commands/explain.c 2008-07-09 16:14:28.0 +0100 *** *** 40,65 explain_get_index_name_hook_type explain_get_index_name_hook = NULL; - typedef struct ExplainState - { - /* options */ - bool printTList; /* print plan targetlists */ - bool printAnalyze; /* print actual times */ - /* other states */ - PlannedStmt *pstmt; /* top of plan */ - List *rtable; /* range table */ - } ExplainState; - static void ExplainOneQuery(Query *query, ExplainStmt *stmt, const char *queryString, ParamListInfo params, TupOutputState *tstate); static void report_triggers(ResultRelInfo *rInfo, bool show_relname, StringInfo buf); - static double elapsed_time(instr_time *starttime); - static void explain_outNode(StringInfo str, - Plan *plan, PlanState *planstate, - Plan *outer_plan, - int indent, ExplainState *es); static void show_plan_tlist(Plan *plan, StringInfo str, int indent, ExplainState *es); static void show_scan_qual(List *qual, const char *qlabel, --- 40,50 *** *** 170,175 --- 155,161 /* plan the query */ plan = planner(query, 0, params); + plan-query_string = queryString; /* run it (if needed) and produce output */ ExplainOnePlan(plan, params, stmt, tstate); *** *** 384,390 } /* Compute elapsed time in seconds since given timestamp */ ! static double elapsed_time(instr_time *starttime) { instr_time endtime; --- 370,376 } /* Compute elapsed time in seconds since given timestamp */ ! double elapsed_time(instr_time *starttime) { instr_time endtime; *** *** 406,412 * side of a join with the current node. This is only interesting for * deciphering runtime keys of an inner indexscan. */ ! static void explain_outNode(StringInfo str, Plan *plan, PlanState *planstate, Plan *outer_plan, --- 392,398 * side of a join with the current node. This is only interesting for * deciphering runtime keys of an inner indexscan. */ ! void explain_outNode(StringInfo str, Plan *plan, PlanState *planstate, Plan *outer_plan, *** ./src/backend/executor/execMain.c.orig 2008-07-08 14:03:34.0 +0100 --- ./src/backend/executor/execMain.c 2008-07-10 10:34:26.0 +0100 *** *** 39,44 --- 39,45 #include catalog/heap.h #include catalog/namespace.h #include catalog/toasting.h + #include commands/explain.h #include commands/tablespace.h #include commands/trigger.h #include executor/execdebug.h *** *** 52,57 --- 53,59 #include storage/lmgr.h #include storage/smgr.h #include utils/acl.h + #include utils/guc.h #include utils/lsyscache.h #include utils/memutils.h #include utils/snapmgr.h *** *** 184,189 --- 186,199 } /* + * If we are tracing and explaining slow-running queries, + * enable instrumentation + */ + if (trace_explain_plan + (trace_min_planner_duration -1 || trace_min_executor_duration -1)) +
Re: [HACKERS] initdb in current cvs head broken?
On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha [EMAIL PROTECTED] wrote: Seems like a bug to me. Is the tree stable only after commit fests and I should not use the unstable tree for generating patches? I quickly tried on my repo and its working fine. (Well it could be a bit out of sync with the head). Usually, the tree may get a bit inconsistent during the active period, but its not very common. I've seen committers doing a good job before checking in any code and making sure it works fine (atleast initdb and regression tests). I would suggest doing a clean build at your end once again. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb in current cvs head broken?
Sushant Sinha wrote: I am trying to generate a patch with respect to the current CVS head. So ai rsynced the tree, then did cvs up and installed the db. However, when I did initdb on a data directory it is stuck: [snip] Seems like a bug to me. Is the tree stable only after commit fests and I should not use the unstable tree for generating patches? If it were really broken then the buildfarm would be showing red: http://www.pgbuildfarm.org/cgi-bin/show_status.pl It isn't, so I suspect the problem is at your end. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb in current cvs head broken?
You are right. I did not do make clean last time. After make clean, make all, and make install it works fine. -Sushant. On Thu, 2008-07-10 at 17:55 +0530, Pavan Deolasee wrote: On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha [EMAIL PROTECTED] wrote: Seems like a bug to me. Is the tree stable only after commit fests and I should not use the unstable tree for generating patches? I quickly tried on my repo and its working fine. (Well it could be a bit out of sync with the head). Usually, the tree may get a bit inconsistent during the active period, but its not very common. I've seen committers doing a good job before checking in any code and making sure it works fine (atleast initdb and regression tests). I would suggest doing a clean build at your end once again. Thanks, Pavan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE CAST too strict?
Am Mittwoch, 9. Juli 2008 schrieb Peter Eisentraut: I propose that we relax these two checks to test for binary-coercibility instead, which is effectively what is expected and required here anyway. Here is the corresponding patch. diff -ur ../cvs-pgsql/doc/src/sgml/ref/create_cast.sgml ./doc/src/sgml/ref/create_cast.sgml --- ../cvs-pgsql/doc/src/sgml/ref/create_cast.sgml 2007-07-10 14:57:00.0 +0200 +++ ./doc/src/sgml/ref/create_cast.sgml 2008-07-10 14:07:27.0 +0200 @@ -205,7 +205,7 @@ para Cast implementation functions can have one to three arguments. - The first argument type must be identical to the cast's source type. + The first argument type must be identical to or binary-compatible with the cast's source type. The second argument, if present, must be type typeinteger/; it receives the type modifier associated with the destination type, or literal-1/ diff -ur ../cvs-pgsql/src/backend/commands/functioncmds.c ./src/backend/commands/functioncmds.c --- ../cvs-pgsql/src/backend/commands/functioncmds.c 2008-07-03 16:53:04.0 +0200 +++ ./src/backend/commands/functioncmds.c 2008-07-10 13:42:26.0 +0200 @@ -48,6 +48,7 @@ #include commands/defrem.h #include commands/proclang.h #include miscadmin.h +#include parser/parse_coerce.h #include parser/parse_func.h #include parser/parse_type.h #include utils/acl.h @@ -1403,10 +1404,10 @@ ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg(cast function must take one to three arguments))); - if (procstruct-proargtypes.values[0] != sourcetypeid) + if (!IsBinaryCoercible(sourcetypeid, procstruct-proargtypes.values[0])) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg(argument of cast function must match source data type))); + errmsg(argument of cast function must match or be binary-compatible with source data type))); if (nargs 1 procstruct-proargtypes.values[1] != INT4OID) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), @@ -1415,10 +1416,10 @@ ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg(third argument of cast function must be type boolean))); - if (procstruct-prorettype != targettypeid) + if (!IsBinaryCoercible(procstruct-prorettype, targettypeid)) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg(return data type of cast function must match target data type))); + errmsg(return data type of cast function must match or be binary-compatible with target data type))); /* * Restricting the volatility of a cast function may or may not be a -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest rules
Robert Treat [EMAIL PROTECTED] writes: On Monday 07 July 2008 21:56:34 Bruce Momjian wrote: Right now you advance the static link to the next commit fest once the current one starts --- I was hoping for a link that advances when the commit fest is done so I could make it a permanent tab in Firefox. This is what I had proposed/changed, but see up thread where Tom disagreed with this idea. I surely do not have an objection to having a link defined as above --- I just wanted to be clear on what we meant by current commitfest. We probably need two separate terms for the place to submit new patches and the place we are trying to commit patches from. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
On Thu, Jul 10, 2008 at 04:12:34PM +0530, Abhijit Menon-Sen wrote: At 2008-07-09 17:06:19 -0700, [EMAIL PROTECTED] wrote: I'm really new to this git thing, but I now have access to create git-shell accounts, etc. on git.postgresql.org. Any ideas you can offer on how better to handle this would really help me. :) The question is: what is your objective in providing this repository? Here are my objectives: 1. Make a repository that keeps up with CVS HEAD. 2. Allow people who are not currently committers on CVS HEAD to make needed changes. I've only just cloned your repository, so I can only guess at how it is managed, but you seem to be rebasing your changes on top of the current Postgres source and responding to upstream changes by throwing away the old patches and applying the new ones. (By the way, your origin/master appears to be lagging the current HEAD by 71 commits, i.e. a month.) If you know a better way to do this, I'm all ears :) I'm completely new to git and pretty fuzzy on CVS. If your objective is only to make an up-to-date patch always available, then it is unnecessary to publicise your repository. You could just use git-rebase to stay abreast of significant changes in origin/master and run git-format-patch to generate a patch... but then you still end up with essentially the same thing that Tatsuo Ishii posted to the list the other day anyway. I agree with Alvaro. If the developers aren't committing to this repository that the patches are generated from, there's really no point to using the repository for review. It's very much simpler to just read the patch as posted to the list. They aren't committing, at least in part, because they did not have any way to do so. I'm fixing things so that they do by creating git-shell accounts on git.postgresql.org which will have write access to that repository. To get such an account, please send me your public key and preferred user name so I can move forward on this. The only real benefit to review that I can imagine would be if full change history were available, which it could do if a) changes were committed separately with proper comments and b) if the branch were *NOT* rebased, but instead periodically merged with origin/master. Great idea! I'd be happy to wipe this repository and start over just as you propose. It would be even nicer if we can put together a standard procedure for new patches. Would you be willing to write it up? That way I could pull from the repository and run e.g. git-log --stat origin/master..with-recursive or similar. Excellent :) Hope this helps. It does indeed. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Stephen R. van den Berg [EMAIL PROTECTED] writes: Then, from a client perspective, there is no use at all, because the client can actually pause reading the results at any time it wants, when it wants to avoid storing all of the result rows. The network will perform the cursor/fetch facility for it. [ shrug... ] In principle you could write a client library that would act that way, but I think you'll find that none of the extant ones will hand back an incomplete query result to the application. A possibly more convincing argument is that with that approach, the connection is completely tied up --- you cannot issue additional database commands based on what you just read, nor pull rows from multiple portals in an interleaved fashion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest rules
On Thu, Jul 10, 2008 at 3:16 PM, Tom Lane [EMAIL PROTECTED] wrote: I surely do not have an objection to having a link defined as above --- I just wanted to be clear on what we meant by current commitfest. We probably need two separate terms for the place to submit new patches and the place we are trying to commit patches from. Well we have two separate links now, with hints as to their usage: # Upcoming CommitFest - add new patches here # In-progress CommitFest - patch review underway here -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Tom Lane wrote: Stephen R. van den Berg [EMAIL PROTECTED] writes: Then, from a client perspective, there is no use at all, because the client can actually pause reading the results at any time it wants, when it wants to avoid storing all of the result rows. The network will perform the cursor/fetch facility for it. [ shrug... ] In principle you could write a client library that would act that way, but I think you'll find that none of the extant ones will hand back an incomplete query result to the application. A possibly more convincing argument is that with that approach, the connection is completely tied up --- you cannot issue additional database commands based on what you just read, nor pull rows from multiple portals in an interleaved fashion. I really think we need to get something like this into libpq. It's on my TODO list after notification payloads and libpq support for arrays and composites. We'll need to come up with an API before we do much else. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Follow-up on replication hooks for PostgreSQL
Hi Marko, No fear, we definitely will discuss on pgsql-hackers. I just wanted to make sure that people understood we are still committed to solving this problem and will one way or another commit resources to help. Just to be clear, by logical replication I mean replication based on sending SQL or near-SQL (e.g., generic DML events) between servers. Physical replication on the other hand uses internal formats to replicate changes without intervening conversion to SQL, for example by shipping WAL records. There are advantages to each for different applications. BTW, I heard this nomenclature from Simon Riggs. It seems quite helpful. The DDL trigger proposal is interesting and would be a very useful feature addition to PostgreSQL. To execute correctly it may also be necessary to know which database you were using at the time the SQL was issued. For our part we are looking for ways to replicate most or all data on a server as efficiently as possible. Generic call-outs at commit time or reading the log directly are attractive approaches. Depending on the implementation you can avoid double writes of replicated data on the master host. Also, it avoids the management headache of ensuring that triggers are correctly installed. It seems as if one of these generic approaches could hook into WAL record transport. Cheers, Robert On 7/10/08 4:56 AM, Marko Kreen [EMAIL PROTECTED] wrote: On 7/10/08, Robert Hodges [EMAIL PROTECTED] wrote: This is a quick update on a promise I made early in June to suggest requirements as well as ways to add replication hooks that would support logical replication, as opposed to the physical replication work currently underway based on NTT's code. Well, June was a pretty busy month, so it has taken a while to get back to this. However, we are now beginning to examine options for PostgreSQL logical replication. To make a long story short we are willing to commit resources to this problem or fund other people to do it for us. If you are interested please contact me directly. Meanwhile, we are quite serious about this problem and intend to work on helpful additions to PostgreSQL in this area. I will post more as we make progress. Well, I'm not exactly sure what you are planning. It's OK to do draft design privately, but before actually starting coding, the design should be discussed in -hackers. And I'm not exactly sure what you mean in logical replication? Way to log DDL statements? Do you want to log DML also? FWIW, here's very draft design for functionality that could be used to make current Slony-I/Londiste-like solutions to replicate DDL also. 1. CREATE DDL TRIGGER statement that allows to call function for all DDL statements. Only filtering that makes sense here is filtering by area: tables/functions/views/etc. It must be possible to do AFTER trigger. Whether BEFORE trigger for DDL make sense or not, I'm not sure. 2. When function is called, following information is given: - Object type the event was for (table/view/function) - Array of object names. - SQL statement as text. The trigger function can filter further based on object names whether it does want to log the event or not. Trying to make the trigger run on only subset of events is complex, and parsing the SQL to pieces for trigger to understand it better is also complex and neither is needed. Unless there are some common situation where such simple design fails to work, I would not make the scheme more complex. Also the design should be based on assumption that the target side is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE. We should not make scheme more complex to survive cases where target is not in sync. That way madness lies. The effect should be like same SQL statements are applied to target by hand, no more, no less. -- marko -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] CommitFest rules
Dave Page [EMAIL PROTECTED] writes: On Thu, Jul 10, 2008 at 3:16 PM, Tom Lane [EMAIL PROTECTED] wrote: I surely do not have an objection to having a link defined as above --- I just wanted to be clear on what we meant by current commitfest. We probably need two separate terms for the place to submit new patches and the place we are trying to commit patches from. Well we have two separate links now, with hints as to their usage: # Upcoming CommitFest - add new patches here # In-progress CommitFest - patch review underway here That's fine, but what Bruce asked for was bookmarkable links defined in both ways --- right now, only the first one has a permanent alias. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] UUID - Data type inefficient
The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 but this it's very inefficient as you can read here [1]. The ideal would be use bit(128), but today isn't possible. One possible solution would be create a structure with 2 fields, each one with bit(64). [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
* David Fetter [EMAIL PROTECTED] [080710 10:19]: The question is: what is your objective in providing this repository? Here are my objectives: 1. Make a repository that keeps up with CVS HEAD. There are already at least 2 public ones that do: git://repo.or.cz/PostgreSQL.git git.postgresql.org (which seems to work best on http only!?!?!) 2. Allow people who are not currently committers on CVS HEAD to make needed changes. Uh, the point of git is it's distributed, so you don't need to be involved for them to do that If you know a better way to do this, I'm all ears :) I'm completely new to git and pretty fuzzy on CVS. Well, if you want to use git to it's fullest extent, you really need to unlearn *all* of the ideas and restrictions your fuzzy CVS knowledge is handcuffing you with. Or at least be able to stuff it in a box in some dark recess of your brain, so as to not let it control the way you try and use Git. Git is *inherently* distributed. And it's *only* a Content tracker. Because of this, it works spectacularly well as 2 quite different tools: 1) A developer tool to manage their ideas, developments, and code (track) 2) A content distribution tool (publish) They aren't committing, at least in part, because they did not have any way to do so. I'm fixing things so that they do by creating git-shell accounts on git.postgresql.org which will have write access to that repository. Committing in GIT has *nothing* to do with an account on git.postgresql.org. It's a local operation, and if they are using git already, they can publish any number of places. And If they aren't using git already, then a git-shell account, or some special single repo on git.postgresql.org isn't going to change that. You seem to be trying to setup git as a centralized distribution tool (#2 above), without it being used as a developer tool (#1). I really don't see the point of that. The only difference from CVS that use will provide is you can provide CVS+patch easily. But anybody who's going to use git to get CVS+patch already has git, and thus has CVS+patch locally available in 1 command anyways... I think Git's use as #1 *has* to come first. And, because of it's inherently distributed nature, #2 just happens once people are using it... a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] UUID - Data type inefficient
Kless wrote: The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 No it is not. It is stored as 16 binary bytes. As text it won't fit into 16 bytes. but this it's very inefficient as you can read here [1]. What on earth makes you assume that MySQL performance characteristics apply to PostgreSQL? If you want to show that our implementation is inefficient, you need to produce PostgreSQL performance tests to demonstrate it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb in current cvs head broken?
Sushant Sinha [EMAIL PROTECTED] writes: You are right. I did not do make clean last time. After make clean, make all, and make install it works fine. My ironclad rule for syncing with CVS is make distclean cvs update reconfigure, rebuild The cycles you save by taking shortcuts all vanish in trying to debug the occasional problem. Especially since that's human time, not machine cycles (which can easily be overlapped with, say, reading mail). Using configure --enable-depend (which is not the default) will prevent certain categories of problems, but I don't trust it to solve them all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
Aidan Van Dyk wrote: * David Fetter [EMAIL PROTECTED] [080710 10:19]: 2. Allow people who are not currently committers on CVS HEAD to make needed changes. Uh, the point of git is it's distributed, so you don't need to be involved for them to do that Yep. People can already clone the master Pg trunk, and start from there to build patches. If they use their *private* repos for this, awesome -- they have complete history. If they want other developers to chime in with further patches, they just need to publish their repos, and let other clone them. Then, they can pull from those other repos, or allow others to push. If you want to keep updating to trunk as it moves forward, I guess you'd need to propagate the changes from trunk to your RECURSIVE repo. And if upstream changes the patch to fix some bug, you really need that bugfix to show as a separate (and probably very small) patch. Unapplying the patch and applying it back seems the worst way to proceed. Like Aidan, I think that trying to centralize the GIT repo is trying to circumvent GIT's design ideas rather than working with them. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE CAST too strict?
Peter Eisentraut [EMAIL PROTECTED] writes: Am Mittwoch, 9. Juli 2008 schrieb Peter Eisentraut: I propose that we relax these two checks to test for binary-coercibility instead, which is effectively what is expected and required here anyway. Here is the corresponding patch. Looks good, but you might want to add a sentence to the documentation pointing out that the function result type has to match the cast target type; that seems not to be explicitly stated anywhere. I also notice that the CREATE CAST page is still worded as if binary compatible were a symmetric relationship, which it is not. We probably need to rework it a bit; but that's not really related to this patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
On Thu, Jul 10, 2008 at 11:31:00AM -0400, Alvaro Herrera wrote: Aidan Van Dyk wrote: * David Fetter [EMAIL PROTECTED] [080710 10:19]: 2. Allow people who are not currently committers on CVS HEAD to make needed changes. Uh, the point of git is it's distributed, so you don't need to be involved for them to do that Yep. People can already clone the master Pg trunk, and start from there to build patches. If they use their *private* repos for this, awesome -- they have complete history. If they want other developers to chime in with further patches, they just need to publish their repos, Publishing those repos is easiest on git.postgresql.org. and let other clone them. Then, they can pull from those other repos, or allow others to push. Again, git.postgresql.org is good for this and other places are not for reasons I've mentioned before. If you want to keep updating to trunk as it moves forward, I guess you'd need to propagate the changes from trunk to your RECURSIVE repo. And if upstream changes the patch to fix some bug, you really need that bugfix to show as a separate (and probably very small) patch. Unapplying the patch and applying it back seems the worst way to proceed. Fine. I proceeded in ignorance and will fix. I'm more than delighted to start the whole thing over based on this. Like Aidan, I think that trying to centralize the GIT repo is trying to circumvent GIT's design ideas rather than working with them. It's not about centralizing, but about letting a bunch of people publish their changes to the same spot without being committers on the Postgres project. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Kless wrote: The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 but this it's very inefficient as you can read here [1]. The ideal would be use bit(128), but today isn't possible. One possible solution would be create a structure with 2 fields, each one with bit(64). [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ That's a general page about UUID vs serial integers. What is the complaint? Do you have evidence that it would be noticeably faster as two 64-bits? Note that a UUID is broken into several non-64 bit elements, and managing it as bytes or 64-bit integers, or as a union with the bit-lengths specified, are probably all efficient or inefficient depending on the operation being performed. The hope should be that the optimizer will generate similar best code for each. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Mark Mielke wrote: Kless wrote: The new data type, UUID, is stored as a string -char(16)-: struct pg_uuid_t { unsigned char data[UUID_LEN]; }; #define UUID_LEN 16 What is the complaint? Do you have evidence that it would be noticeably faster as two 64-bits? Note that a UUID is broken into several non-64 bit elements, and managing it as bytes or 64-bit integers, or as a union with the bit-lengths specified, are probably all efficient or inefficient depending on the operation being performed. The hope should be that the optimizer will generate similar best code for each. I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Mark Mielke [EMAIL PROTECTED] writes: Kless wrote: [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ That's a general page about UUID vs serial integers. AFAICT the author of that page thinks that UUIDs are stored in ASCII form (32 hex digits), which would indeed be inefficient. I have no idea whether he knows what he's talking about with respect to mysql, but it's certainly 100% irrelevant to the Postgres datatype. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Mark Mielke wrote: I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. G Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit numbers = 8 bytes, 128-bit numbers = 16 bytes. You are out to lunch and you dragged me with you. Did we have beer at least? :-) Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
* David Fetter [EMAIL PROTECTED] [080710 11:34]: Yep. People can already clone the master Pg trunk, and start from there to build patches. If they use their *private* repos for this, awesome -- they have complete history. If they want other developers to chime in with further patches, they just need to publish their repos, Publishing those repos is easiest on git.postgresql.org. Obviously not, but let's hope that the current situation changes ;-) But that has nothing to do with the original point of this repo question WRT the recursive patch. and let other clone them. Then, they can pull from those other repos, or allow others to push. Again, git.postgresql.org is good for this and other places are not for reasons I've mentioned before. And I've not been convinced by them before either ;-) It's not about centralizing, but about letting a bunch of people publish their changes to the same spot without being committers on the Postgres project. But thank's to GIT, that same spot is irrelevant ;-) I do think that git.postgresql.org can have value. But I think it's value is directly related (or more correctly *derived*) from the work that's happening by developers *using* git, and the use of git for development is a necessary prerequisite for publishing that development. They don't even have to be publishing to git.postgresql.org for git.postgresql.org to benefit form that development (because git is distributed)! If development isn't happening with git, and git.postgresql.org is only a set of mirrors of CVS+patches, then I don't see *any* value of git.postgresql.org. I'm pretty confident that git.postgresql.org won't *remain* valueless, because I'm pretty confident that git really is a tool that many developers will come to use... But if you're new to git, *I* think your time would be better spent doing actual development using git than trying to manage git repositories and just mirror CVS+patches. As you learn GIT, tracking CVS+patches will be something you pretty much just do, and forget your actually doing. Publishing repos will also become something you just do and forget your actually doing But I really do think you need to worry about *using* git before you worry about *publishing* with git, especially if you're new to git, and have have fuzzy ideas about CVS still as your idea/framework for that publishing ;-) But all that's only my opinion, so take it with a grain of salt, or another pint ;-) a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] UUID - Data type inefficient
On Thu, 2008-07-10 at 12:05 -0400, Mark Mielke wrote: Mark Mielke wrote: I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. G Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit numbers = 8 bytes, 128-bit numbers = 16 bytes. You are out to lunch and you dragged me with you. Did we have beer at least? :-) Sounds like at least 4 and a couple of chasers. Cheers, mark -- Mark Mielke [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
On Jul 10, 2008, at 09:13, Joshua D. Drake wrote: You are out to lunch and you dragged me with you. Did we have beer at least? :-) Sounds like at least 4 and a couple of chasers. Next time I'd like to be invited to the party, too! :-P David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Tom Lane wrote: Stephen R. van den Berg [EMAIL PROTECTED] writes: Then, from a client perspective, there is no use at all, because the client can actually pause reading the results at any time it wants, when it wants to avoid storing all of the result rows. The network will perform the cursor/fetch facility for it. [ shrug... ] In principle you could write a client library that would act that way, but I think you'll find that none of the extant ones will hand back an incomplete query result to the application. True. But I have written one just now. The language is called Pike, it's a C/C++/Java lookalike. And I start returning rows as they arrive, and pause reading from the network when the application wants to pause. A possibly more convincing argument is that with that approach, the connection is completely tied up --- you cannot issue additional database commands based on what you just read, nor pull rows from multiple portals in an interleaved fashion. Interleaved retrieval using multiple portals is not what most libraries support, I'd guess. It can be supported at the application layer using multiple cursors, but that works with my approach as well. In practice, most applications that need that, open multiple connections to the same database (I'd think). The only thing I could imagine is that *if* at the server end, the notifications that arrive during the retrieval of one long running Execute, are queued *after* all the data, instead of inserted into the datastream, then it might be worth doing it differently. Incidentally, the nice thing about my library is that it automatically does arguments in binary which are easily processed in binary (TEXT/BYTEA/ and all those others I mentioned earlier). It automatically transmits those arguments in binary for *both* arguments and rowresults; i.e. in one row I can have both text and binary columns, without the application needing to specify which is which. -- Sincerely, Stephen R. van den Berg. If you can't explain it to an 8-year-old, you don't understand it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
You are out to lunch and you dragged me with you. Did we have beer at least? :-) A bit, and you had a byte of bread. -- Med venlig hilsen Kaare Rasmussen, Jasonic Jasonic Telefon: +45 3816 2582 Nordre Fasanvej 12 2000 Frederiksberg Email: [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
Tom Lane [EMAIL PROTECTED] writes: Mark Mielke [EMAIL PROTECTED] writes: Kless wrote: [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/ That's a general page about UUID vs serial integers. AFAICT the author of that page thinks that UUIDs are stored in ASCII form (32 hex digits), which would indeed be inefficient. Well he does say In fact if you store UUID in binary form you can bring it down to 16 bytes so size is not really the problem. Though I'm unclear why he thinks a 4x increase in space usage is not really a problem. If you have a highly relational database you can easily have half or more your columns in large tables consisting of foreign keys. If your database is i/o bandwidth limited that would be a huge effect. I have no idea whether he knows what he's talking about with respect to mysql, but it's certainly 100% irrelevant to the Postgres datatype. The rest of it seems to be pretty mysql-specific. Some of the problems are universal such as making index inserts especially random and making clustering impossible, but how much they hurt on different databases is going to be very different. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
(I don't really have much to add to the discussion here; I'm just posting for the record on the question of client behaviour, since I also wrote and maintain a client library in C++.) At 2008-07-10 18:40:03 +0200, [EMAIL PROTECTED] wrote: I start returning rows as they arrive, and pause reading from the network when the application wants to pause. My library also starts returning rows as they arrive, and in fact my application makes heavy use of that feature. The data rows are read from a non-blocking socket and the caller either does something for each one, or waits until they've all arrived before proceeding. Interleaved retrieval using multiple portals is not what most libraries support, I'd guess. My code did support that mode of operation in theory, but in practice in the few situations where I have needed to use something like it, I found it more convenient to open explicit cursors and FETCH from them (but I usually needed this inside a transaction, and so did not open multiple connections). Thus my code always sets maxrows to 0 at the moment, and so... The only thing I could imagine is that *if* at the server end, the notifications that arrive during the retrieval of one long running Execute, are queued *after* all the data, instead of inserted into the datastream, then it might be worth doing it differently. ...I can't comment on this interesting observation. i.e. in one row I can have both text and binary columns, without the application needing to specify which is which. Yes, that's nice. My first attempt to define an API for bind variables set the data format to text by default and allowed it to be overriden, but that was much too troublesome. Now the code decides by itself what format is best to use for a given query. (Again, though my library certainly supports mixing text and binary format columns, my application has not needed to use this feature.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Stephen R. van den Berg [EMAIL PROTECTED] writes: A possibly more convincing argument is that with that approach, the connection is completely tied up --- you cannot issue additional database commands based on what you just read, nor pull rows from multiple portals in an interleaved fashion. Interleaved retrieval using multiple portals is not what most libraries support, I'd guess. It can be supported at the application layer using multiple cursors, but that works with my approach as well. In practice, most applications that need that, open multiple connections to the same database (I'd think). Er? There's nothing particularly unusual about application logic like: $sth-execute('huge select'); while ($sth-fetch('foreign_key')) { ... do some processing which is hard to do in server-side language ... $sth-execute('insert resulting data'); } Most drivers do support this kind of interface but they may be reading the entire result set for huge select in advance. However if ti's large enough then this is only going to really work if you can start a new portal while the outer portal is actually running on the backend. If the driver tries to cache the whole result set the programmer will be sad. Back when I was doing PHP programming and I discovered that PHP's Postgres driver didn't support this I thought it was an outrageous bug. (It didn't help that the behaviour was to misbehave randomly rather than throw a meaningful error.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Gregory Stark [EMAIL PROTECTED] writes: Stephen R. van den Berg [EMAIL PROTECTED] writes: In practice, most applications that need that, open multiple connections to the same database (I'd think). Er? There's nothing particularly unusual about application logic like: $sth-execute('huge select'); while ($sth-fetch('foreign_key')) { ... do some processing which is hard to do in server-side language ... $sth-execute('insert resulting data'); } Moreover, there's often good reasons to do it all within one transaction, which is impossible if you rely on a separate connection to issue the inserts on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] digest
set pgsql-hackers digest -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH RECURSIVE updated to CVS TIP
At 2008-07-10 07:18:28 -0700, [EMAIL PROTECTED] wrote: Here are my objectives: 1. Make a repository that keeps up with CVS HEAD. 2. Allow people who are not currently committers on CVS HEAD to make needed changes. OK. Then, to begin with, I think it is very important to make the repository available via the git protocol. HTTP just won't cut it. It would be even nicer if we can put together a standard procedure for new patches. Would you be willing to write it up? The standard procedure for new patches would be the standard procedure for *any* patches when you use git. You have a branch that tracks the upstream (by which I mean the Postgres source) and a branch where you work (apply individual changes), and you merge with the origin every now and then (either in your working branch or in another branch). And once you've published a branch, you try never to rebase it. The apply individual changes part could be done by hand (git-apply, git-commit), or by accepting individual patches via email (git-am) or pulling from a remote repository, or by having others push into your repository. It doesn't matter. Again, git.postgresql.org is good for this and other places are not for reasons I've mentioned before. I haven't seen your reasons, but frankly, I would be suspicious of them even if git.postgresql.org filled me with confidence, which it doesn't. It seems to lag some way behind CVS and, as Alvaro pointed out earlier, may be missing some patches. (I realise those might have been teething troubles, and it may even be fixed now, but I just use the mirror on repo.or.cz instead.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?
Abhijit Menon-Sen [EMAIL PROTECTED] writes: Interleaved retrieval using multiple portals is not what most libraries support, I'd guess. My code did support that mode of operation in theory, but in practice in the few situations where I have needed to use something like it, I found it more convenient to open explicit cursors and FETCH from them Note that using FETCH for each record means a round trip to the server for each record. If you're dealing with a lot of records that could be a lot slower than streaming them to the client as quickly as it can consume them. Now I'm not sure anyone's actually done any experiments to optimize libpq or other drivers to stream data efficiently, so I'm not sure how much you would really lose in practice today. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest: how does handoff work for non-committer reviewers?
Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: Well, one thing I think we want to do by having non-committer reviewers, is to not involve a committer at all if the patch is going to be sent back. So one thing I was thinking of is: 1) change status to ready for committer 2) post message to -hackers detailing the review and calling for a committer to check the patch 3) a committer picks it up Well, the key point there is just the sign-off in the review message. On the wiki, or on -hackers? --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] digest
On Jul 10, 2008, at 1:20 PM, Fabrízio de Royes Mello wrote: set pgsql-hackers digest Postgresql hackers have been successfully digested. *burp* -M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UUID - Data type inefficient
On Jul 10, 5:05 pm, [EMAIL PROTECTED] (Mark Mielke) wrote: Mark Mielke wrote: I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16? I will investigate if I have time tonight. There MUST be some mistake or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16. G Kless you've confused me. 32-bit numbers = 4 bytes, 64-bit numbers = 8 bytes, 128-bit numbers = 16 bytes. You are out to lunch and you dragged me with you. Did we have beer at least? :-) Cheers, mark -- xDxD I see that the PostgreSQL developers have sense of humor :) I like it. It has been a failure mine. I question about that in the IRC, anybody says me that structure but also say me of see here: pgsql/src/backend/utils/adt/uuid.c:45:uuid_out thing that I didn't make. But it's clear that this problem has been well resolved. Greetings! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest: how does handoff work for non-committer reviewers?
Josh Berkus [EMAIL PROTECTED] writes: Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: 1) change status to ready for committer 2) post message to -hackers detailing the review and calling for a committer to check the patch 3) a committer picks it up Well, the key point there is just the sign-off in the review message. On the wiki, or on -hackers? On -hackers. All the substantive stuff should be in the mail archives; the wiki page is only a current-status display. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Generating code coverage reports
I have a patch that I will be submitting to add to the build system the capability of reporting on test code coverage metrics for the test suite. Actually it can show coverage for any application run against PostgreSQL. Download Image:Coverage.tar.gz http://wiki.postgresql.org/wiki/Image:Coverage.tar.gz to see an example report. Gunzip and un-tar the file and click on coverage/index.html. I had to delete most of the files to decrease the file size for upload, so only the links for access work. gcov reports line, branch, and function coverage, but lcov only reports on line coverage. I've added a link to the html to the gcov output that shows summary statistics for each file for line, branch, and function calls. The report gives a very clear and browseable view of what parts of the system might benefit from more extensive testing. It's obviously useful for planning future testing, but also can be used in conjunction with debugging to see what lines and functions are being exercised or missed by existing tests of the functionality under investigation. It could even be helpful to give a static view of lines hit by a bug test case in lieue of using a debugger. Also, when you're writing a unit test for new functionality, it would be good to check what you're actually hitting with the test. It uses gcov together with gcc to generate the statistics, and the lcov suite to create the html report. Both of these would obviously have to be installed to get a coverage report, but this would be an optional feature of the build. It only works with gcc. To generate coverage statistics, you run configure with --enable-coverage and after building and running tests, you do make coverage. The process generates data files in the same directories as source object files and produces a coverage directory at the top level with the html files. I've also set it up so a tar file with the html is generated. More information on gcov at http://gcc.gnu.org/onlinedocs/gcc/Gcov.html, lcov at http://ltp.sourceforge.net/documentation/how-to/ltp.php (coverage/lcov tabs). -- Michelle -- Michelle Caisse Sun Microsystems California, U.S. http://sun.com/postgresql
Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements
ITAGAKI Takahiro [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: I don't want the tag there at all, much less converted to a pointer. What would the semantics be of copying the node, and why? Please justify why you must have this and can't do what you want some other way. In my pg_stat_statements plugin, the tag is used to cache hash values of SQL strings in PlannedStmt. It is not necessarily needed because the hash value is re-computable from debug_query_string. It is just for avoiding the work. In addition, we see different SQLs in debug_query_string in PREPARE/EXECUTE and DECLARE/FETCH. Hashed SQL cache can work on those commands. Actually, that aspect of the plugin is 100% broken anyway, because it assumes that debug_query_string has got something to do with the query being executed. There are any number of scenarios where this is a bad assumption. I wonder whether we ought to change things so that the real query source text is available at the executor level. Since we are (at least usually) storing the query text in cached plans, I think this might just require some API refactoring, not extra space and copying. It would amount to a permanent decision that we're willing to pay the overhead of keeping the source text around, though. Also, after looking at the patch more closely, was there a good reason for making the hook intercept ExecutePlan rather than ExecutorRun? ExecutePlan was never intended to have a stable public API --- its argument list is just a happenstance of what ExecutorRun needs to fetch for its own purposes. I think we should keep it private and have ExecutorRun do if (hook) hook(...); else standard_ExecutorRun(...); regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how that compares? Very possibly. I repeat that the current implementation of compute_minimal_stats is very ad-hoc code and wasn't written with an eye to high performance. Replacing it with an algorithm that someone actually thought about might well be worth doing. Here's a patch that combines both patches included here: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] and adds a C implementation of the Lossy Counting algorithm. It defines two typanalyze functions: ts_typanalyze_std and ts_typanalyze_lc, so you can see what statistics are gathered by each of them. It's meant for easy applying to HEAD, updating pg_type and running ANALYZE on a few tables with tsvectors (i.e. testing, not commiting). My observations are: the LC algorithm beats the previous one by a fairly large margin (20-30%) timewise. The results are almost identical, I got discrepancies of about 0.05 for some lexemes' frequencies. I intend to stick with LC for tsvectors and that'll allow to throw away the Dllist changes. If I want to keep my GSoC schedule I won't be able to implement LC for general statistics gathering, but it's trivial. If no one gets about it I can do it after the Summer of Code (if only to see how it'll work). Oh, one important thing. You need to choose a bucket width for the LC algorithm, that is decide after how many elements will you prune your data structure. I chose to prune after every twenty tsvectors. You might consider: - picking some other arbitrary value - making it depend on the largest tsvector size - making it depend on the statistics_target - pruning after each X lexemes instead of after each Y tsvectors, because now the buckets will vary in width and you can argue that the order of input makes a difference again. OTOH the situation here is a bit different: you get streams of mutually different elements (lexemes inside a tsvector are all different) and pruning in the middle of such stream might be unfair for lexemes that are still to be processed. Hmm, dunno. Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin gsoc08-tss-03-with-dllist.diff.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Generating code coverage reports
Michelle Caisse [EMAIL PROTECTED] writes: I have a patch that I will be submitting to add to the build system the capability of reporting on test code coverage metrics for the test suite. Cool. To generate coverage statistics, you run configure with --enable-coverage and after building and running tests, you do make coverage. The process generates data files in the same directories as source object files and produces a coverage directory at the top level with the html files. How does that work with a VPATH build? Are you trying to say that you still have to have the original build tree around in order to collect coverage data? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Jan Urbański wrote: Oh, one important thing. You need to choose a bucket width for the LC algorithm, that is decide after how many elements will you prune your data structure. I chose to prune after every twenty tsvectors. Do you prune after X tsvectors regardless of the numbers of lexemes in them? I don't think that preserves the algorithm properties; if there's a bunch of very short tsvectors and then long tsvectors, the pruning would take place too early for the initial lexemes. I think you should count lexemes, not tsvectors. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Alvaro Herrera wrote: Jan Urbański wrote: Oh, one important thing. You need to choose a bucket width for the LC algorithm, that is decide after how many elements will you prune your data structure. I chose to prune after every twenty tsvectors. Do you prune after X tsvectors regardless of the numbers of lexemes in them? I don't think that preserves the algorithm properties; if there's a bunch of very short tsvectors and then long tsvectors, the pruning would take place too early for the initial lexemes. I think you should count lexemes, not tsvectors. Yes, that's what I was afraid of. I'm not sure why I was reluctant to prune in the middle of a tsvector, maybe it's just in my head. Still, there's a decision to be made: after how many lexemes should the pruning occur? -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Alvaro Herrera [EMAIL PROTECTED] writes: Jan UrbaÅski wrote: Oh, one important thing. You need to choose a bucket width for the LC algorithm, that is decide after how many elements will you prune your data structure. I chose to prune after every twenty tsvectors. Do you prune after X tsvectors regardless of the numbers of lexemes in them? I don't think that preserves the algorithm properties; if there's a bunch of very short tsvectors and then long tsvectors, the pruning would take place too early for the initial lexemes. I think you should count lexemes, not tsvectors. Yeah. I haven't read the Lossy Counting paper in detail yet, but I suspect that the mathematical proof of limited error doesn't work if the pruning is done on a variable spacing. I don't see anything very wrong with pruning intra-tsvector; the effects ought to average out, since the point where you prune is going to move around with respect to the tsvector boundaries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes: Still, there's a decision to be made: after how many lexemes should the pruning occur? The way I think it ought to work is that the number of lexemes stored in the final pg_statistic entry is statistics_target times a constant (perhaps 100). I don't like having it vary depending on tsvector width --- why for example should a column having a few wide tsvectors get a bigger stats entry than one with many narrow ones? (Not to mention the issue of having to estimate the average or max width before you can start the counting run.) But in any case, given a target number of lexemes to accumulate, I'd suggest pruning with that number as the bucket width (pruning distance). Or perhaps use some multiple of the target number, but the number itself seems about right. The LC paper says that the bucket width w is equal to ceil(1/e) where e is the maximum frequency estimation error, and that the maximum number of table entries needed is log(eN)/e after N lexemes have been scanned. For the values of e and N we are going to be dealing with, this is likely to work out to a few times 1/e, in other words the table size is a few times w. (They prove it's at most 7w given reasonable assumptions about data distribution, regardless of how big N gets; though I think our values for N aren't large enough for that to matter.) The existing compute_minimal_stats code uses a table size of twice the target number of values, so setting w to maybe a half or a third of the target number would reproduce the current space usage. I don't see a problem with letting it get a little bigger though, especially since we can expect that the lexemes aren't very long. (compute_minimal_stats can't assume that for arbitrary data types...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] collation support revisited (phase 1)
Hi, after long discussion with Mr. Kotala, we've decided to redesign our collation support proposal. For those of you who aren't familiar with my WIP patch and comments from other hackers here's the original mail: http://archives.postgresql.org/pgsql-hackers/2008-07/msg00019.php In a few sentences - I'm writing collation support for PostgreSQL that is almost independent on used collating function. I will implement POSIX locales but switch to ICU will be quite easy. Collations and character sets defined by SQL standard will be hard coded so we avoid non-existence in some functions. The whole project will be divided into two phases: phase 1 Implement sort of framework so the PostgreSQL will have basic guts (pg_collation pg_charset catalogs, CREATE COLLATION, add collation support for each type needed) and will support collation at database level. This phase has been accepted as a Google Summer of Code project. phase 2 Implement the rest - full collation at column level. I will continue working on this after finishing phase one and it will be my master degree thesis. How will the first part work? Catalogs - new catalogs pg_collation and pg_charset will be defined - pg_collation and pg_charset will contain SQL standard collations + optional default collation (when set other than SQL standard one) - pg_type, pg_attribute, pg_namespace will be extended with references to default records in pg_collation and pg_charset initdb - pg_collation pg_charset will contain each pre-defined records regarding SQL standard and optionally one record that will be non-standard set when creating initdb (the one using system locales) - these two records will be referenced by pg_type, pg_attribute, pg_namespace in concerned columns and will be concidered as default collation that will be inherited CREATE DATABASE ... COLLATE ... - after copying the new database the collation will be default (same as cluster collation) or changed by COLLATE statement. Then we update pg_type, pg_attribute and pg_namespace catalogs - reindex database When changing databases the database collation will be retrieved from type text from pg_type. This part should be the only one that will be deleted when proceeding with phase 2. But that will take a while :-) Thanks for all your comments Regards Radek Strnad
Re: [HACKERS] Generating code coverage reports
It should be possible to make it work for a VPATH build with appropriate arguments to gcov and lcov, but currently it expects the object files and generated data files to be in the build directory. You need access to the build tree to generate coverage statistics and to generate the report with make coverage after running the tests or application. -- Michelle Tom Lane wrote: Michelle Caisse [EMAIL PROTECTED] writes: I have a patch that I will be submitting to add to the build system the capability of reporting on test code coverage metrics for the test suite. Cool. To generate coverage statistics, you run configure with --enable-coverage and after building and running tests, you do make coverage. The process generates data files in the same directories as source object files and produces a coverage directory at the top level with the html files. How does that work with a VPATH build? Are you trying to say that you still have to have the original build tree around in order to collect coverage data? regards, tom lane -- Michelle Caisse Sun Microsystems California, U.S. http://sun.com/postgresql
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
Tom Lane wrote: The way I think it ought to work is that the number of lexemes stored in the final pg_statistic entry is statistics_target times a constant (perhaps 100). I don't like having it vary depending on tsvector width I think the existing code puts at most statistics_target elements in a pg_statistic tuple. In compute_minimal_stats() num_mcv starts with stats-attr-attstattarget and is adjusted only downwards. My original thought was to keep that property for tsvectors (i.e. store at most statistics_target lexemes) and advise people to set it high for their tsvector columns (e.g. 100x their default). Also, the existing code decides which elements are worth storing as most common ones by discarding those that are not frequent enough (that's where num_mcv can get adjusted downwards). I mimicked that for lexemes but maybe it just doesn't make sense? But in any case, given a target number of lexemes to accumulate, I'd suggest pruning with that number as the bucket width (pruning distance). Or perhaps use some multiple of the target number, but the number itself seems about right. Fine with me, I'm too tired to do the math now, so I'll take your word for it :) Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding variables for segment_size, wal_segment_size and block sizes
Abhijit Menon-Sen [EMAIL PROTECTED] writes: At 2008-07-03 16:36:02 +0200, [EMAIL PROTECTED] wrote: Here's a patch for this. I reviewed the patch, it basically looks fine. A few quibbles with the provided documentation: Applied, with ams' doc changes and some further wordsmithing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes: Tom Lane wrote: The way I think it ought to work is that the number of lexemes stored in the final pg_statistic entry is statistics_target times a constant (perhaps 100). I don't like having it vary depending on tsvector width I think the existing code puts at most statistics_target elements in a pg_statistic tuple. In compute_minimal_stats() num_mcv starts with stats-attr-attstattarget and is adjusted only downwards. My original thought was to keep that property for tsvectors (i.e. store at most statistics_target lexemes) and advise people to set it high for their tsvector columns (e.g. 100x their default). Well, (1) the normal measure would be statistics_target *tsvectors*, and we'd have to translate that to lexemes somehow; my proposal is just to use a fixed constant instead of tsvector width as in your original patch. And (2) storing only statistics_target lexemes would be uselessly small and would guarantee that people *have to* set a custom target on tsvector columns to get useful results. Obviously broken defaults are not my bag. Also, the existing code decides which elements are worth storing as most common ones by discarding those that are not frequent enough (that's where num_mcv can get adjusted downwards). I mimicked that for lexemes but maybe it just doesn't make sense? Well, that's not unreasonable either, if you can come up with a reasonable definition of not frequent enough; but that adds another variable to the discussion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gsoc, text search selectivity and dllist enhancments
On Wed, 9 Jul 2008, Jan Urbaski wrote: Jan Urbaski wrote: Do you think it's worthwhile to implement the LC algorithm in C and send it out, so others could try it out? Heck, maybe it's worthwhile to replace the current compute_minimal_stats() algorithm with LC and see how that compares? I and Teodor are using LC for phrase estimation in one application and from our understanding of the original paper this algorithm might be not good for sampling, since all theory behind was about streaming of FULL data. As for technique we use suffix tree, which should be fine for typical sample size. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgres 8.1 doesn't like pg_standby's -l option
I was trying to set up warm standby for an 8.1.11 instance, and was using pg_standby's -l option so that it creates links and does not actually copies files. After struggling for a few hours, I found two problems; one big, one small. The smaller issue is that even if we do not end the restore_command's value with a ' (quote), the restore process still runs just fine! The bigger issue, Postgres seems to not recognize links as valid files, even though pg_standby was successfully able to restore them! I did not dig the issue any deeper, so cannot say if it was a misbehaving stat() or Postgres in version 8.1 did not know how to handle symlinks! Just and FYI for all, and for the archives. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device