Re: [HACKERS] Fwd: patch: format function - fixed oid
Hello 2010/11/20 Jeff Janes jeff.ja...@gmail.com: On Thu, Nov 18, 2010 at 11:54 PM, Pavel Stehule pavel.steh...@gmail.com wrote: -- Forwarded message -- From: Pavel Stehule pavel.steh...@gmail.com Date: 2010/11/18 Subject: Re: patch: format function, next generation To: Jeff Janes jeff.ja...@gmail.com Kopie: pgsql-hackers-ow...@postgresql.org Hello somebody takes my oid :) updated patch is in attachment Regards Pavel Stehule Dear Pavel and Hackers, I've reviewed this patch. It applied, makes, and passes make check. It has added regression tests that seem appropriate. I think the feature added matches the consensus that emerged from the very long email discussion. The C code seems fine (to my meager abilities to judge that). But I think the documentation does need some work. From func.sgml: This functions can be used to create a formated string or message. There are allowed three types of tags: %s as string, %I as SQL identifiers and %L as SQL literals. Attention: result for %I and %L must not be same as result of functionquote_ident/function and functionquote_literal/function functions, because this function doesn't try to coerce parameters to typetext/type type and directly use a type's output functions. The placeholder can be related to some explicit parameter with using a optional n$ specification inside format. Should we make it explicit that this is inspired by C's sprintf? Do we want to call them tags? This is introducing what seems to be a new word to describe what are usually (I think) called conversion specifiers. I am not native speaker, so I invite any correction in documentation - anything what I wrote is more/less just skelet for somebody, whu can speak better than me. I am not against to note - so this function is similar to sprintf function, but then should be specified - so this function is different - designed to request PL languages and environment. I have not a problem with replacing tags by conversion or positional specifiers. Somewhere is used term placeholder?? Must not be the same should be Might not be the same. However, it does not appear that quote_ident is willing to use coercion at all, and the %L behavior is more comparable to quote_nullable. Maybe: This function can be used to create a formatted string suitable for use as dynamic SQL or as a message. There are three types of conversion specifiers: %s for literal strings, %I for SQL identifiers, and %L for SQL literals. Note that the results of the %L conversion might not be the same as the results of the functionquote_nullable/function function, as the latter coerces its argument to typetext/type while functionformat/function uses a type's output function. A conversion can reference an explicit parameter position by using an optional n$ in the format specification. I am for it Does type's output function need to cross-reference someplace? coercion is described elsewhere in this section of docs, but output functions are not. probably output functions are described in some hacker parts http://www.postgresql.org/docs/9.0/interactive/xtypes.html And for the changes to plpgsql.sgml, I would propose: para Building a string for dynamic SQL statement can be simplified by using the functionformat/function function (see xref linkend=functions-string): programlisting EXECUTE format('UPDATE tbl SET %I = %L WHERE key = %L', colname, newvalue, keyvalue); /programlisting The functionformat/function format can be used together with the literalUSING/literal clause: programlisting EXECUTE format('UPDATE tbl SET %I = $1 WHERE key = $2', colname) USING newvalue, keyvalue; /programlisting This form is more efficient because the parameters literalnewvalue/literal and literalkeyvalue/literal are not converted to text. /para +1 These are mostly grammatical changes, but with the last three lines I may have missed the meaning of what you originally intended--I'm not sure on that. I think so you are a correct. Who will a submit this final version? You or me? Thank you very much Regards Pavel Stehule Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Fri, Nov 19, 2010 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote: Sure thing. Thanks for taking time to do this - very nice speedup. This part now committed, too. It occurs to me belatedly that there might be a better way to do this. Instead of flipping value from negative to positive, with a special case for the smallest possible integer, we could do it the other round. And actually, I think we can rid of neg, too. if (value 0) *a++ = '-'; else value = -value; start = a; Then we could just adjust the calculation of the actual digit. *a++ = '0' + (-remainder); Good idea? Bad idea? Seems cleaner to me, assuming it'll actually work... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: patch: format function - fixed oid
On Sat, Nov 20, 2010 at 2:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I think so you are a correct. Who will a submit this final version? You or me? I've got it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: patch: format function - fixed oid
2010/11/20 Robert Haas robertmh...@gmail.com: On Sat, Nov 20, 2010 at 2:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I think so you are a correct. Who will a submit this final version? You or me? I've got it. Thank you nice a day Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for seg picksplit function
On 2010-11-20 04:46, Robert Haas wrote: On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Tue, Nov 16, 2010 at 3:07 AM, Robert Haasrobertmh...@gmail.com wrote: But on a broader note, I'm not very certain the sorting algorithm is sensible. For example, suppose you have 10 segments that are exactly '0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding, but it seems like this will result in a 15/15 split when we almost certainly want a 10/20 split. I think there will be problems in more complex cases as well. The documentation says about the less-than and greater-than operators that These operators do not make a lot of sense for any practical purpose but sorting. In order to illustrate a real problem we should think about gist behavior with great enough amount of data. For example, I tried to extrapolate this case to 10 of segs where 40% are (0,1) segs and 60% are (1,2) segs. And this case doesn't seem a problem for me. Well, the problem with just comparing on is that it takes very little account of the upper bounds. I think the cases where a simple split would hurt you the most are those where examining the upper bound is necessary to to get a good split. With the current 8K default blocksize, I put my money on the sorting algorithm for any seg case. The r-tree algorithm's performance is probably more influenced by large buckets-low tree depth-generic keys on non leaf nodes. regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for seg picksplit function
On Sat, Nov 20, 2010 at 6:46 AM, Robert Haas robertmh...@gmail.com wrote: Well, the problem with just comparing on is that it takes very little account of the upper bounds. I think the cases where a simple split would hurt you the most are those where examining the upper bound is necessary to to get a good split. Yes, also such asymmetric solution seems not beautiful enough for me :). It's easy to sort segs by their center, in this case lower and upper bound will be used equally. New patch is attached. I checked it on various data distributions. 1) Uniform distribution test=# insert into seg_test (select (a || ' .. ' || a + 0.5*b)::seg from (select random() as a, random() as b from generate_series(1,100)) x); INSERT 0 100 Time: 79121,830 ms test=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 176409,434 ms test=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN -- Bitmap Heap Scan on seg_test (cost=28.19..2500.32 rows=1000 width=12) (actual time=0.251..0.886 rows=27 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=3 read=27 - Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000 width=0) (actual time=0.193..0.193 rows=27 loops=1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=3 Total runtime: 1.091 ms (7 rows) Time: 41,884 ms 2) Natural distribution (Box–Muller transform was used for data generation) test=# insert into seg_test (select ( a - 0.5*abs(b) || ' .. ' || a + 0.5*abs(b))::seg from (select cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as a, cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as b from generate_series(1,100)) x); INSERT 0 100 Time: 98614,305 ms test=# create index seg_test_idx on seg_test using gist(a); CREATE INDEX Time: 212513,540 ms test=# explain (buffers, analyze) select * from seg_test where a @ '0.3 .. 0.3'::seg; QUERY PLAN -- Bitmap Heap Scan on seg_test (cost=28.18..2500.31 rows=1000 width=12) (actual time=0.132..0.428 rows=27 loops=1) Recheck Cond: (a @ '0.3'::seg) Buffers: shared hit=3 read=27 - Bitmap Index Scan on seg_test_idx (cost=0.00..27.93 rows=1000 width=0) (actual time=0.103..0.103 rows=27 loops=1) Index Cond: (a @ '0.3'::seg) Buffers: shared hit=3 Total runtime: 0.504 ms (7 rows) Time: 0,967 ms 3) Many distinct values test=# insert into seg_test (select (a||'..'||(a+1))::seg from (select (random()*13000)::integer as a from generate_series(1,100)) x); INSERT 0 100 Time: 90775,952 ms test=# create index seg_test_idx on seg_test using gist(a); CREATE INDEX Time: 200960,758 ms test=# explain (buffers, analyze) select * from seg_test where a @ '700.0 .. 700.0'::seg; QUERY PLAN --- Bitmap Heap Scan on seg_test (cost=28.19..2500.33 rows=1000 width=12) (actual time=0.358..3.531 rows=138 loops=1) Recheck Cond: (a @ '700.0'::seg) Buffers: shared hit=3 read=135 - Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000 width=0) (actual time=0.270..0.270 rows=138 loops=1) Index Cond: (a @ '700.0'::seg) Buffers: shared hit=3 Total runtime: 3.882 ms (7 rows) Time: 5,271 ms With best regards, Alexander Korotkov. *** a/contrib/seg/seg.c --- b/contrib/seg/seg.c *** *** 292,329 gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result) return (result); } /* ** The GiST PickSplit method for segments ! ** We use Guttman's poly time split algorithm */ GIST_SPLITVEC * gseg_picksplit(GistEntryVector *entryvec, GIST_SPLITVEC *v) { ! OffsetNumber i, ! j; ! SEG *datum_alpha, ! *datum_beta; SEG *datum_l, ! *datum_r; ! SEG *union_d, ! *union_dl, ! *union_dr; ! SEG *inter_d; ! bool firsttime; ! float size_alpha, ! size_beta, ! size_union, ! size_inter; ! float size_waste, ! waste; ! float size_l, ! size_r; int nbytes; ! OffsetNumber seed_1 = 1, ! seed_2 = 2; OffsetNumber *left, *right; OffsetNumber maxoff; --- 292,340 return (result); } + /* + * Auxiliary structure for picksplit method. + */ + typedef struct + { + OffsetNumber index; + float center; + SEG *data; + } PickSplitSortItem; + /* + * Compare function for PickSplitSortItem based on seg_cmp. + */ + static int + sort_item_cmp(const void *a, const void *b) + { + PickSplitSortItem *i1 =
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
Robert Haas robertmh...@gmail.com writes: It occurs to me belatedly that there might be a better way to do this. Instead of flipping value from negative to positive, with a special case for the smallest possible integer, we could do it the other round. And actually, I think we can rid of neg, too. The trouble with that approach is that you have to depend on the direction of rounding for negative quotients. Which was unspecified before C99, and it's precisely pre-C99 compilers that are posing a hazard to the current coding. FWIW, I find the code still pretty darn unsightly. I think this change is just wrong: * Avoid problems with the most negative integer not being representable * as a positive integer. */ - if (value == INT32_MIN) + if (value == INT_MIN) { memcpy(a, -2147483648, 12); and even with INT32_MIN it was pretty silly, because there is exactly 0 hope of the code behaving sanely for some other value of the symbolic constant. I think it'd be much better to abandon the macros altogether and write if (value == (-2147483647-1)) { memcpy(a, -2147483648, 12); Likewise for the int64 case, which BTW is no safer for pre-C99 compilers than it was yesterday: LL is not the portable way to write int64 constants. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Sat, Nov 20, 2010 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: The trouble with that approach is that you have to depend on the direction of rounding for negative quotients. Which was unspecified before C99, and it's precisely pre-C99 compilers that are posing a hazard to the current coding. Interesting. I wondered whether there might be compilers out there that handled that inconsistently, but then I thought I was probably being paranoid. Likewise for the int64 case, which BTW is no safer for pre-C99 compilers than it was yesterday: LL is not the portable way to write int64 constants. Gah. I wish we had some documentation of this stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
BTW, while we're thinking about marginal improvements: instead of constructing the string backwards and then reversing it in-place, what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I haven't tested this but it seems likely to be roughly a wash speed-wise. The reason I find the idea attractive is that it will immediately expose any caller that is providing a buffer shorter than the required length, whereas now such callers will appear to work fine if they're only tested on small values. A small downside is that pg_itoa would then need its own implementation instead of just punting to pg_ltoa. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] match_clause_to_indexcol()
I was looking at KNNGIST some more today and found myself trying to disentangle what match_clause_to_indexcol() is actually doing. It appears to me that the opfamily passed to that function is always the same as index-opfamily[indexcol], which seems like needless notational complexity. Maybe I'm missing something, but the attached patch seems to make things simpler and clearer. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 38b0930..7b8fd97 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -83,7 +83,7 @@ static PathClauseUsage *classify_index_clause_usage(Path *path, static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static int find_list_position(Node *node, List **nodelist); static bool match_clause_to_indexcol(IndexOptInfo *index, - int indexcol, Oid opfamily, + int indexcol, RestrictInfo *rinfo, Relids outer_relids, SaOpControl saop_control); @@ -1053,7 +1053,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, List *clausegroup_list = NIL; bool found_outer_clause = false; int indexcol = 0; - Oid *families = index-opfamily; *found_clause = false; /* default result */ @@ -1062,7 +1061,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, do { - Oid curFamily = families[0]; List *clausegroup = NIL; ListCell *l; @@ -1074,7 +1072,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, Assert(IsA(rinfo, RestrictInfo)); if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, saop_control)) @@ -1094,7 +1091,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, Assert(IsA(rinfo, RestrictInfo)); if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, saop_control)) @@ -1113,9 +1109,8 @@ group_clauses_by_indexkey(IndexOptInfo *index, clausegroup_list = lappend(clausegroup_list, clausegroup); indexcol++; - families++; - } while (!DoneMatchingIndexKeys(families)); + } while (indexcol index-ncolumns); if (!*found_clause !found_outer_clause) return NIL;/* no indexable clauses anywhere */ @@ -1185,7 +1180,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, static bool match_clause_to_indexcol(IndexOptInfo *index, int indexcol, - Oid opfamily, RestrictInfo *rinfo, Relids outer_relids, SaOpControl saop_control) @@ -1196,6 +1190,7 @@ match_clause_to_indexcol(IndexOptInfo *index, Relids left_relids; Relids right_relids; Oid expr_op; + Oid opfamily = index-opfamily[indexcol]; bool plain_op; /* @@ -1582,23 +1577,18 @@ matches_any_index(RestrictInfo *rinfo, RelOptInfo *rel, Relids outer_relids) { IndexOptInfo *index = (IndexOptInfo *) lfirst(l); int indexcol = 0; - Oid *families = index-opfamily; do { - Oid curFamily = families[0]; - if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, SAOP_ALLOW)) return true; indexcol++; - families++; - } while (!DoneMatchingIndexKeys(families)); + } while (indexcol index-ncolumns); } return false; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] match_clause_to_indexcol()
Robert Haas robertmh...@gmail.com writes: I was looking at KNNGIST some more today and found myself trying to disentangle what match_clause_to_indexcol() is actually doing. It appears to me that the opfamily passed to that function is always the same as index-opfamily[indexcol], which seems like needless notational complexity. Maybe I'm missing something, but the attached patch seems to make things simpler and clearer. Thoughts? +1. I think the existing coding dates from a time when we didn't have IndexOptInfo at all, or at least didn't pass it around to all these sub-functions, so there was no other path for getting at the info. But if you're going to do that, get rid of DoneMatchingIndexKeys altogether, along with the extra zero that plancat.c adds to the opfamily array. We don't need to be using more than one way to iterate over those arrays. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, while we're thinking about marginal improvements: instead of constructing the string backwards and then reversing it in-place, what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I haven't tested this but it seems likely to be roughly a wash speed-wise. The reason I find the idea attractive is that it will immediately expose any caller that is providing a buffer shorter than the required length, whereas now such callers will appear to work fine if they're only tested on small values. A small downside is that pg_itoa would then need its own implementation instead of just punting to pg_ltoa. I think that might be more clever than is really warranted. I get your point about buffer overrun, but I don't think it's that hard for callers to do the right thing, so I'm inclined to think that's not worth much in this case. Of course, if memmove() can be implemented as a single assembler instruction or something, that might be appealing from a speed standpoint, but otherwise I think we may as well stick with this. There's less chance of needlessly touching an extra cache line, less chance of being confused by leftover garbage in memory after the end of the output string, and less duplicate code. I had given some thought to whether it might make sense to try to figure out how long the string will be before we actually start generating it, so that we can just start in the exactly right space and have to clean up afterward. But the obvious implementation seems like it could be more expensive than just doing the copy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
Robert Haas robertmh...@gmail.com writes: On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I think that might be more clever than is really warranted. I get your point about buffer overrun, but I don't think it's that hard for callers to do the right thing, so I'm inclined to think that's not worth much in this case. Fair enough --- it was just a passing thought. I had given some thought to whether it might make sense to try to figure out how long the string will be before we actually start generating it, so that we can just start in the exactly right space and have to clean up afterward. But the obvious implementation seems like it could be more expensive than just doing the copy. Yeah. You certainly don't want to do the division sequence twice, and a log() call wouldn't be cheap either, and there don't seem to be many other alternatives. If we were going to get picky about avoiding the reverse step, I'd go with Andres' idea of changing the API to pass back an address instead of guaranteeing that the result begins at the start of the buffer. But I think that's much more complicated for callers than it's worth. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] match_clause_to_indexcol()
On Sat, Nov 20, 2010 at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I was looking at KNNGIST some more today and found myself trying to disentangle what match_clause_to_indexcol() is actually doing. It appears to me that the opfamily passed to that function is always the same as index-opfamily[indexcol], which seems like needless notational complexity. Maybe I'm missing something, but the attached patch seems to make things simpler and clearer. Thoughts? +1. I think the existing coding dates from a time when we didn't have IndexOptInfo at all, or at least didn't pass it around to all these sub-functions, so there was no other path for getting at the info. But if you're going to do that, get rid of DoneMatchingIndexKeys altogether, Sure. That's a giant crock. along with the extra zero that plancat.c adds to the opfamily array. We don't need to be using more than one way to iterate over those arrays. I am slightly worried that this might break third-party code, or even some part of the core code that's still using the old system. I don't mind if you want to rip it out, but personally, I'd rather leave that part well enough alone. Revised patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 38b0930..482026d 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -37,11 +37,6 @@ #include utils/selfuncs.h -/* - * DoneMatchingIndexKeys() - MACRO - */ -#define DoneMatchingIndexKeys(families) (families[0] == InvalidOid) - #define IsBooleanOpfamily(opfamily) \ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID) @@ -83,7 +78,7 @@ static PathClauseUsage *classify_index_clause_usage(Path *path, static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static int find_list_position(Node *node, List **nodelist); static bool match_clause_to_indexcol(IndexOptInfo *index, - int indexcol, Oid opfamily, + int indexcol, RestrictInfo *rinfo, Relids outer_relids, SaOpControl saop_control); @@ -1053,7 +1048,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, List *clausegroup_list = NIL; bool found_outer_clause = false; int indexcol = 0; - Oid *families = index-opfamily; *found_clause = false; /* default result */ @@ -1062,7 +1056,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, do { - Oid curFamily = families[0]; List *clausegroup = NIL; ListCell *l; @@ -1074,7 +1067,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, Assert(IsA(rinfo, RestrictInfo)); if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, saop_control)) @@ -1094,7 +1086,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, Assert(IsA(rinfo, RestrictInfo)); if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, saop_control)) @@ -1113,9 +1104,8 @@ group_clauses_by_indexkey(IndexOptInfo *index, clausegroup_list = lappend(clausegroup_list, clausegroup); indexcol++; - families++; - } while (!DoneMatchingIndexKeys(families)); + } while (indexcol index-ncolumns); if (!*found_clause !found_outer_clause) return NIL;/* no indexable clauses anywhere */ @@ -1185,7 +1175,6 @@ group_clauses_by_indexkey(IndexOptInfo *index, static bool match_clause_to_indexcol(IndexOptInfo *index, int indexcol, - Oid opfamily, RestrictInfo *rinfo, Relids outer_relids, SaOpControl saop_control) @@ -1196,6 +1185,7 @@ match_clause_to_indexcol(IndexOptInfo *index, Relids left_relids; Relids right_relids; Oid expr_op; + Oid opfamily = index-opfamily[indexcol]; bool plain_op; /* @@ -1582,23 +1572,18 @@ matches_any_index(RestrictInfo *rinfo, RelOptInfo *rel, Relids outer_relids) { IndexOptInfo *index = (IndexOptInfo *) lfirst(l); int indexcol = 0; - Oid *families = index-opfamily; do { - Oid curFamily = families[0]; - if (match_clause_to_indexcol(index, indexcol, - curFamily, rinfo, outer_relids, SAOP_ALLOW)) return true; indexcol++; - families++; - } while (!DoneMatchingIndexKeys(families)); + } while (indexcol index-ncolumns); } return false; @@ -1621,11 +1606,10 @@ eclass_matches_any_index(EquivalenceClass *ec, EquivalenceMember *em, { IndexOptInfo *index = (IndexOptInfo *) lfirst(l); int indexcol = 0; - Oid *families = index-opfamily; do { - Oid curFamily = families[0]; + Oid curFamily = index-opfamily[indexcol]; /* * If it's a btree index, we can reject it if its opfamily isn't @@ -1643,8
Re: [HACKERS] match_clause_to_indexcol()
Robert Haas robertmh...@gmail.com writes: On Sat, Nov 20, 2010 at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: But if you're going to do that, get rid of DoneMatchingIndexKeys altogether, Sure. That's a giant crock. along with the extra zero that plancat.c adds to the opfamily array. We don't need to be using more than one way to iterate over those arrays. I am slightly worried that this might break third-party code, or even some part of the core code that's still using the old system. I don't mind if you want to rip it out, but personally, I'd rather leave that part well enough alone. Revised patch attached. I'll take the responsibility if you don't want to ;-) I think your revised patch is incorrect, or at least not terribly safe, to just remove the last DoneMatchingIndexKeys test and not replace it with anything else. Other than that it looks okay as far as it goes. Do you want to commit it yourself, or shall I incorporate it in a more extensive cleanup? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] match_clause_to_indexcol()
On Sat, Nov 20, 2010 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Nov 20, 2010 at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: But if you're going to do that, get rid of DoneMatchingIndexKeys altogether, Sure. That's a giant crock. along with the extra zero that plancat.c adds to the opfamily array. We don't need to be using more than one way to iterate over those arrays. I am slightly worried that this might break third-party code, or even some part of the core code that's still using the old system. I don't mind if you want to rip it out, but personally, I'd rather leave that part well enough alone. Revised patch attached. I'll take the responsibility if you don't want to ;-) Sold! :-) I think your revised patch is incorrect, or at least not terribly safe, to just remove the last DoneMatchingIndexKeys test and not replace it with anything else. Other than that it looks okay as far as it goes. Do you want to commit it yourself, or shall I incorporate it in a more extensive cleanup? If it's all right with you, I'll go ahead and commit this and then you can break as much more stuff as you like. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] match_clause_to_indexcol()
Robert Haas robertmh...@gmail.com writes: On Sat, Nov 20, 2010 at 1:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think your revised patch is incorrect, or at least not terribly safe, to just remove the last DoneMatchingIndexKeys test and not replace it with anything else. Oh, now that I look at it I notice the after-the-fact Assert claiming that the clausegroup list isn't too long. That can definitely be done better. Do you want to commit it yourself, or shall I incorporate it in a more extensive cleanup? If it's all right with you, I'll go ahead and commit this and then you can break as much more stuff as you like. :-) Go for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
On Fri, Nov 19, 2010 at 10:51 PM, Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote: Is no one ready to help on this? :( Apparently, no one is ready to drop what they are doing to immediately answer your email before all the other ones they need to answer, but I wouldn't infer from that that no one wants to help. It doesn't seem very realistic to me to expect a 12-hour turnaround on free support, but what do I know? With regards to your question, for each type of plan node, there is an associated plan state node. This is an important distinction because, IIUC, plans can be reused, so plan state contains the information that might need to be reset on each run. Scan state is just a plan state node for a scan node. I believe a tuple projection is what you get when you do something like SELECT generate_series(1,10) FROM tbl - the set-returning function has to be projected in multiple tuples. EState I could use some hints on myself. A qual is a filter condition, e.g. in SELECT * FROM tbl WHERE x = 1, the x = 1 part is a qual. It's helpful to grep src/include for the structures in question; the information they contain often helps to understand their purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] match_clause_to_indexcol()
On Sat, Nov 20, 2010 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: If it's all right with you, I'll go ahead and commit this and then you can break as much more stuff as you like. :-) Go for it. Your turn. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
Thanks a lot for the answer. I see a lot of people discussing so many things so I thought my email would have been ignored by those with a lot coming in already. Thanks for the enlightenment. -Vaibhav (*_*) On Sun, Nov 21, 2010 at 12:25 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 19, 2010 at 10:51 PM, Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote: Is no one ready to help on this? :( Apparently, no one is ready to drop what they are doing to immediately answer your email before all the other ones they need to answer, but I wouldn't infer from that that no one wants to help. It doesn't seem very realistic to me to expect a 12-hour turnaround on free support, but what do I know? With regards to your question, for each type of plan node, there is an associated plan state node. This is an important distinction because, IIUC, plans can be reused, so plan state contains the information that might need to be reset on each run. Scan state is just a plan state node for a scan node. I believe a tuple projection is what you get when you do something like SELECT generate_series(1,10) FROM tbl - the set-returning function has to be projected in multiple tuples. EState I could use some hints on myself. A qual is a filter condition, e.g. in SELECT * FROM tbl WHERE x = 1, the x = 1 part is a qual. It's helpful to grep src/include for the structures in question; the information they contain often helps to understand their purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
On Sat, Nov 20, 2010 at 1:59 PM, Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote: Thanks a lot for the answer. I see a lot of people discussing so many things so I thought my email would have been ignored by those with a lot coming in already. Thanks for the enlightenment. Sure. Just FYI, with these kinds of things, I do try to keep track of them because I like to try to make sure that everyone gets an answer back. But, I don't always have time to do it immediately, and sometimes I wait even if I do, if I think that someone more knowledgeable on that particular topic might chime in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 5:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: But what about timings vs. random other stuff? Like in this case there's a problem if the signal arrives before the memory update to latch-is_set becomes visible. I don't know what we need to do to guarantee that. I don't believe there's an issue there. A context swap into the kernel is certainly going to include msync. If you're afraid otherwise, you could put an msync before the kill() call, but I think it's a waste of effort. So what DO we need to guard against here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
I sure do respect that. Appreciated. :) -Vaibhav (*_*) On Sun, Nov 21, 2010 at 12:33 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 20, 2010 at 1:59 PM, Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote: Thanks a lot for the answer. I see a lot of people discussing so many things so I thought my email would have been ignored by those with a lot coming in already. Thanks for the enlightenment. Sure. Just FYI, with these kinds of things, I do try to keep track of them because I like to try to make sure that everyone gets an answer back. But, I don't always have time to do it immediately, and sometimes I wait even if I do, if I think that someone more knowledgeable on that particular topic might chime in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Saturday 20 November 2010 18:34:04 Tom Lane wrote: BTW, while we're thinking about marginal improvements: instead of constructing the string backwards and then reversing it in-place, what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I haven't tested this but it seems likely to be roughly a wash speed-wise. The reason I find the idea attractive is that it will immediately expose any caller that is providing a buffer shorter than the required length, whereas now such callers will appear to work fine if they're only tested on small values. Tried that, the cost was measurable although not big (~3-5%)... Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Saturday 20 November 2010 18:18:32 Robert Haas wrote: Likewise for the int64 case, which BTW is no safer for pre-C99 compilers than it was yesterday: LL is not the portable way to write int64 constants. Gah. I wish we had some documentation of this stuff. Dito. I started doing Cish stuff quite a bit *after* C99 was mostly available in gcc... Sorry btw, for not realizing those points (and the regression-expectation file) myself... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Sat, Nov 20, 2010 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I think that might be more clever than is really warranted. I get your point about buffer overrun, but I don't think it's that hard for callers to do the right thing, so I'm inclined to think that's not worth much in this case. It also seems wrong that a caller might happen to know that their argument will never be more than n digits but still has to allocate a buffer large enough to hold 2^64. Fair enough --- it was just a passing thought. I had given some thought to whether it might make sense to try to figure out how long the string will be before we actually start generating it, so that we can just start in the exactly right space and have to clean up afterward. But the obvious implementation seems like it could be more expensive than just doing the copy. Yeah. You certainly don't want to do the division sequence twice, and a log() call wouldn't be cheap either, and there don't seem to be many other alternatives. There are bittwiddling hacks for computing log based 2. I'm not sure it's worth worrying about to this degree though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
Greg Stark gsst...@mit.edu writes: On Sat, Nov 20, 2010 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I had given some thought to whether it might make sense to try to figure out how long the string will be before we actually start generating it, so that we can just start in the exactly right space and have to clean up afterward. But the obvious implementation seems like it could be more expensive than just doing the copy. Yeah. You certainly don't want to do the division sequence twice, and a log() call wouldn't be cheap either, and there don't seem to be many other alternatives. There are bittwiddling hacks for computing log based 2. I'm not sure it's worth worrying about to this degree though. I think converting log2 to log10 *exactly* would end up being not so cheap, anyhow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I was wondering that. I am unclear if we need it though --- can we not assume inet_ntop() exists on all systems? We assumed inet_ntoa() did. The Single Unix Spec includes inet_ntoa but not inet_ntop. Of course, the buildfarm will tell us. The buildfarm unfortunately contains only a subset of the platforms we care about. I don't think this problem is large enough to justify taking a portability risk by depending on non-SUS library functions. If you want to do this, please do it as suggested previously, ie depend on the copy of the code we have internally. I assume you are suggesting to use our inet_net_ntop() even if the system has inet_ntop(). -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
Some additional comments ... Robert Haas robertmh...@gmail.com writes: With regards to your question, for each type of plan node, there is an associated plan state node. This is an important distinction because, IIUC, plans can be reused, so plan state contains the information that might need to be reset on each run. Yeah. The plan tree is supposed to be read-only so far as the executor is concerned, so it needs a parallel plan state tree to hold its runtime state. I believe a tuple projection is what you get when you do something like SELECT generate_series(1,10) FROM tbl - the set-returning function has to be projected in multiple tuples. I think projection is a term out of relational theory, referring to any sort of tuple-by-tuple manipulation of the data. For instance, if you have a row a, b, c and you compute a, c+1 based on that, that's a projection. ExecProject() does this in the general case. The business with SRFs in a targetlist possibly producing multiple rows is a PostQUEL-ism that doesn't have any standard technical name that I know of. A qual is a filter condition, e.g. in SELECT * FROM tbl WHERE x = 1, the x = 1 part is a qual. qual = qualifier. We use that term with various shades of meaning; notably an indexqual is a qualifier that is useful for searching an index. Large parts of the planner also use the term clause to mean about the same thing. It's helpful to grep src/include for the structures in question; Here's my single biggest tip for newcomers to the Postgres source: if you don't use ctags, glimpse, or some other tool that can quickly show you all references to a given identifier, go out and get one. It's one of the easiest ways to learn about things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for seg picksplit function
On 2010-11-20 13:36, Yeb Havinga wrote: On 2010-11-20 04:46, Robert Haas wrote: Well, the problem with just comparing on is that it takes very little account of the upper bounds. I think the cases where a simple split would hurt you the most are those where examining the upper bound is necessary to to get a good split. With the current 8K default blocksize, I put my money on the sorting algorithm for any seg case. The r-tree algorithm's performance is probably more influenced by large buckets-low tree depth-generic keys on non leaf nodes. To test this conjecture I compared a default 9.0.1 postgres (with debugging) to exactly the same postgres but with an 1K blocksize, with the test that Alexander posted upthread. 8K blocksize: postgres=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 99613.308 ms SELECT Total runtime: 81.482 ms 1K blocksize: CREATE INDEX Time: 40113.252 ms SELECT Total runtime: 3.363 ms Details of explain analyze are below. The rowcount results are not exactly the same because I forgot to backup the first test, so created new random data. Though I didn't compare the sorting picksplit this way, I suspect that that algorithm won't be effected so much by the difference in blocksize. regards, Yeb Havinga ** 8K test ostgres=# \timing Timing is on. postgres=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 99613.308 ms postgres=# show block_size ; block_size 8192 (1 row) Time: 0.313 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- - Bitmap Heap Scan on seg_test (cost=44.32..2589.66 rows=1000 width=12) (actual time=91.061..91.304 rows=27 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=581 read=1729 written=298 - Bitmap Index Scan on seg_test_idx (cost=0.00..44.07 rows=1000 width=0) (actual time=91.029..91.029 rows=27 loop s=1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=581 read=1702 written=297 Total runtime: 91.792 ms (7 rows) Time: 309.687 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- - Bitmap Heap Scan on seg_test (cost=44.32..2589.66 rows=1000 width=12) (actual time=81.357..81.405 rows=27 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=1231 read=1079 - Bitmap Index Scan on seg_test_idx (cost=0.00..44.07 rows=1000 width=0) (actual time=81.337..81.337 rows=27 loop s=1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=1204 read=1079 Total runtime: 81.482 ms (7 rows) Time: 82.291 ms ** 1K test postgres=# \timing Timing is on. postgres=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 40113.252 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- Bitmap Heap Scan on seg_test (cost=278.66..3812.85 rows=1000 width=12) (actual time=4.649..4.839 rows=34 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=221 read=385 - Bitmap Index Scan on seg_test_idx (cost=0.00..278.41 rows=1000 width=0) (actual time=4.620..4.620 rows=34 loops =1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=221 read=351 Total runtime: 4.979 ms (7 rows) Time: 6.217 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- Bitmap Heap Scan on seg_test (cost=278.66..3812.85 rows=1000 width=12) (actual time=3.239..3.310 rows=34 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=606 - Bitmap Index Scan on seg_test_idx (cost=0.00..278.41 rows=1000 width=0) (actual time=3.219..3.219 rows=34 loops =1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=572 Total runtime: 3.363 ms (7 rows) Time: 4.063 ms postgres=# show block_size; block_size 1024 (1 row) Time: 0.300 ms -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: So what DO we need to guard against here? I think the general problem can be stated as process A changes two or more values in shared memory in a fairly short span of time, and process B, which is concurrently examining the same variables, sees those changes occur in a different order than A thought it made them in. In practice we do not need to worry about changes made with a kernel call in between, as any sort of context swap will cause the kernel to force cache synchronization. Also, the intention is that the locking primitives will take care of this for any shared structures that are protected by a lock. (There were some comments upthread suggesting maybe our lock code is not bulletproof; but if so that's something to fix in the lock code, not a logic error in code using the locks.) So what this boils down to is being an issue for shared data structures that we access without using locks. As, for example, the latch structures. The other case that I can think of offhand is the signal multiplexing flags. I think we're all right there so far as the flags themselves are concerned because only one atomic update is involved on each side: there's no possibility of inconsistency due to cache visibility skew. But we'd be at some risk if we were using any such flag as a cue to go look at some other shared-memory state. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for seg picksplit function
On 2010-11-20 21:57, Yeb Havinga wrote:8K blocksize: postgres=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 99613.308 ms SELECT Total runtime: 81.482 ms 1K blocksize: CREATE INDEX Time: 40113.252 ms SELECT Total runtime: 3.363 ms Details of explain analyze are below. The rowcount results are not exactly the same because I forgot to backup the first test, so created new random data. Though I didn't compare the sorting picksplit this way, I suspect that that algorithm won't be effected so much by the difference in blocksize. Here are the results for a 1K blocksize (debug enabled) and Alexanders latest (0.5) patch. postgres=# create index seg_test_idx on seg_test using gist (a); CREATE INDEX Time: 37373.398 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- Bitmap Heap Scan on seg_test (cost=209.97..3744.16 rows=1000 width=12) (actual time=0.091..0.283 rows=34 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=6 read=35 - Bitmap Index Scan on seg_test_idx (cost=0.00..209.72 rows=1000 width=0) (actual time=0.071..0.071 rows=34 loops=1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=6 read=1 Total runtime: 0.392 ms (7 rows) Time: 1.798 ms postgres=# explain (buffers, analyze) select * from seg_test where a @ '0.5 .. 0.5'::seg; QUERY PLAN --- Bitmap Heap Scan on seg_test (cost=209.97..3744.16 rows=1000 width=12) (actual time=0.087..0.160 rows=34 loops=1) Recheck Cond: (a @ '0.5'::seg) Buffers: shared hit=41 - Bitmap Index Scan on seg_test_idx (cost=0.00..209.72 rows=1000 width=0) (actual time=0.068..0.068 rows=34 loops=1) Index Cond: (a @ '0.5'::seg) Buffers: shared hit=7 Total runtime: 0.213 ms (7 rows) Time: 0.827 ms -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Bruce Momjian br...@momjian.us writes: I assume you are suggesting to use our inet_net_ntop() even if the system has inet_ntop(). If you're going to have code to do the former, it doesn't seem to be worth the trouble to also have code that does the latter ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
Tom Lane t...@sss.pgh.pa.us writes: Here's my single biggest tip for newcomers to the Postgres source: if you don't use ctags, glimpse, or some other tool that can quickly show you all references to a given identifier, go out and get one. It's one of the easiest ways to learn about things. I've been using cscope (out of advice from Joshua Tolley), and even better its integration into Emacs which is called xcscope.el --- I wouldn't have been able to come up with the extension patch series without that. http://cscope.sourceforge.net/ And as a quick teaser, the keys I mostly use: C-c s d cscope-find-global-definition C-c s s cscope-find-this-symbol C-c s f cscope-find-this-file C-c s I cscope-index-files Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I assume you are suggesting to use our inet_net_ntop() even if the system has inet_ntop(). If you're going to have code to do the former, it doesn't seem to be worth the trouble to also have code that does the latter ... OK, we will not call inet_ntop() at all. I moved the CIDR part of adt/inet_net_ntop.c into adt/inet_cidr_ntop.c, and moved the remaining net part to /port/inet_net_ntop.c. I then changed all uses of inet_ntoa to use inet_net_ntop(). While this churn would perhaps not be warranted just to allow for better error messages, I found pg_getaddrinfo_all() being called from libpq::connectDBStart(), which makes it not thread-safe. I am not excited about backpatching it but it is a threading bug. The output is as expected: $ psql -h localhost test psql: could not connect to server: Connection refused Is the server running on host localhost (127.0.0.1) and accepting TCP/IP connections on port 5432? $ psql -h 127.0.0.1 test psql: could not connect to server: Connection refused Is the server running on host 127.0.0.1 and accepting TCP/IP connections on port 5432? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a911c50..814c27a 100644 *** /tmp/pgrevert.8208/Z2NPcb_libpq.sgml Sat Nov 20 18:06:46 2010 --- doc/src/sgml/libpq.sgml Sat Nov 20 17:11:04 2010 *** PGconn *PQconnectdbParams(const char **k *** 170,180 If literalhost/ is specified without literalhostaddr/, a host name lookup occurs. If literalhostaddr/ is specified without literalhost/, !the value for literalhostaddr/ gives the server address. The connection attempt will fail in any of the cases where a host name is required. If both literalhost/ and literalhostaddr/ are specified, !the value for literalhostaddr/ gives the server address. The value for literalhost/ is ignored unless needed for authentication or verification purposes, in which case it will be used as the host name. Note that authentication is likely to fail --- 170,180 If literalhost/ is specified without literalhostaddr/, a host name lookup occurs. If literalhostaddr/ is specified without literalhost/, !the value for literalhostaddr/ gives the server network address. The connection attempt will fail in any of the cases where a host name is required. If both literalhost/ and literalhostaddr/ are specified, !the value for literalhostaddr/ gives the server network address. The value for literalhost/ is ignored unless needed for authentication or verification purposes, in which case it will be used as the host name. Note that authentication is likely to fail diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index be272b5..ce28abd 100644 *** /tmp/pgrevert.8208/zxOcga_Makefile Sat Nov 20 18:06:46 2010 --- src/backend/utils/adt/Makefile Sat Nov 20 17:11:04 2010 *** OBJS = acl.o arrayfuncs.o array_userfunc *** 23,29 oid.o oracle_compat.o pseudotypes.o rowtypes.o \ regexp.o regproc.o ruleutils.o selfuncs.o \ tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \ ! network.o mac.o inet_net_ntop.o inet_net_pton.o \ ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \ ascii.o quote.o pgstatfuncs.o encode.o dbsize.o genfile.o trigfuncs.o \ tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \ --- 23,29 oid.o oracle_compat.o pseudotypes.o rowtypes.o \ regexp.o regproc.o ruleutils.o selfuncs.o \ tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \ ! network.o mac.o inet_cidr_ntop.o inet_net_pton.o \ ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \ ascii.o quote.o pgstatfuncs.o encode.o dbsize.o genfile.o trigfuncs.o \ tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \ diff --git a/src/include/port.h b/src/include/port.h index 2048768..0cfbed8 100644 *** /tmp/pgrevert.8208/DZNlKd_port.h Sat Nov 20 18:06:46 2010 --- src/include/port.h Sat Nov 20 17:11:42 2010 *** extern void qsort_arg(void *base, size_t *** 437,440 --- 437,444 /* port/chklocale.c */ extern int pg_get_encoding_from_locale(const char *ctype); + /* port/inet_net_ntop.c */ + extern char *inet_net_ntop(int af, const void *src, int bits, + char *dst, size_t size); + #endif /* PG_PORT_H */ diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index ae267ab..88eac70
Re: [HACKERS] Spread checkpoint sync
On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith g...@2ndquadrant.com wrote: The second issue is that the delay between sync calls is currently hard-coded, at 3 seconds. I believe the right path here is to consider the current checkpoint_completion_target to still be valid, then work back from there. That raises the question of what percentage of the time writes should now be compressed into relative to that, to leave some time to spread the sync calls. If we're willing to say writes finish in first 1/2 of target, syncs execute in second 1/2, that I could implement that here. Maybe that ratio needs to be another tunable. Still thinking about that part, and it's certainly open to community debate. I would speculate that the answer is likely to be nearly binary. The best option would either be to do the writes as fast as possible and spread out the fsyncs, or spread out the writes and do the fsyncs as fast as possible. Depending on the system set up. The thing to realize that complicates the design is that the actual sync execution may take a considerable period of time. It's much more likely for that to happen than in the case of an individual write, as the current spread checkpoint does, because those are usually cached. In the spread sync case, it's easy for one slow sync to make the rest turn into ones that fire in quick succession, to make up for lost time. I think the behavior of file systems and operating systems is highly relevant here. We seem to have a theory that allowing a delay between the write and the fsync should give the OS a chance to start writing the data out, I thought that the theory was that doing too many fsync in short order can lead to some kind of starvation of other IO. If the theory is that we want to wait between writes and fsyncs, then the current behavior is probably the best, Spreading out the writes and then doing all the syncs at the end gives the best delay time between an average write and the sync of that written to file. Or, spread the writes out over 150 seconds, sleep for 140 seconds, then do the fsyncs. But I don't think that that is the theory. but do we have any evidence indicating whether and under what circumstances that actually occurs? For example, if we knew that it's important to wait at least 30 s but waiting 60 s is no better, that would be useful information. Another question I have is about how we're actually going to know when any given fsync can be performed. For any given segment, there are a certain number of pages A that are already dirty at the start of the checkpoint. Dirty in the shared pool, or dirty in the OS cache? Then there are a certain number of additional pages B that are going to be written out during the checkpoint. If it so happens that B = 0, we can call fsync() at the beginning of the checkpoint without losing anything (in fact, we gain something: any pages dirtied by cleaning scans or backend writes during the checkpoint won't need to hit the disk; Aren't those pages written out by cleaning scans and backend writes while the checkpoint is occurring exactly what you defined to be page set B, and then to be zero? and if the filesystem dumps more of its cache than necessary on fsync, we may as well take that hit before dirtying a bunch more stuff). But if B 0, then we should attempt the fsync() until we've written them all; otherwise we'll end up having to fsync() that segment twice. Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea. For example, given files F1 ... Fn with dirty pages needing checkpoint writes, we could do the following: first, do any pending fsyncs for files not among F1 .. Fn; then, write all pages for F1 and fsync, write all pages for F2 and fsync, write all pages for F3 and fsync, etc. This might seem dumb because we're not really giving the OS a chance to write anything out before we fsync, but think about the ext3 case where the whole filesystem cache gets flushed anyway. It's much better to dump the cache at the beginning of the checkpoint and then again after every file than it is to spew many GB of dirty stuff into the cache and then drop the hammer. But the kernel has knobs to prevent that from happening. dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer kernels), dirty_expire_centisecs. Don't these knobs work? Also, ext3 is supposed to do a journal commit every 5 seconds under default mount conditions. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
2010/11/12 Teodor Sigaev teo...@sigaev.ru: Slightly-more-fleshed out proof of concept patch attached, with actual syntax, documentation, and pg_dump support added. This might be thought of as a subset of the builtin_knngist_core patch, without the parts that make it actually do something useful (which is mostly match_pathkey_to_index - which I'm still rather hoping to abstract in some way via the access method interface, though I'm currently unsure what the best way to do that is). I don't see in your patch how to propagate knowledge of kind of operation (AMOP_SEARCH or AMOP_ORDER) to GiST and consistent method. For both of them they aren't distinguishable. That's not acceptably for both, because GiST needs to choose right traversal algorithm, consistentFn needs role to decide return infinity or false (-1) value. My version of the patch is suffering from a bad case of not being done. It sort of started out as a proof-of-concept to show what the syntax might look like and seems to be gradually growing into something more real. I'm not sure what we'll end up with. My variants informs GiST by SK_ORDER flags and consistentFn looks at strategy number (strategy numbers are different for different purposes). Yeah. At ten thousand feet, I think the open design question here is to what extent it's OK to rely on the fact that the ORDER BY clauses we wish to optimize happen to look a lot like the WHERE clauses we already know how to optimize: namely, they're both binary opclauses of the form indexed-column op constant. Your patch manages to reuse a LOT of existing machinery by shoving ordering expressions through the same code paths that quals take. Code reuse is generally a good thing, but here's we're forming RestrictInfo and ScanKey objects out of things that are neither restrictions nor keys, which might lead to maintainability problems down the road. I'd like to get some input from Tom on how he feels about that, and any alternatives he sees. It seems to me that our concept of ScanDirection is really woefully under-expressive. For example, given: CREATE TABLE foo ( id integer NOT NULL, name character varying NOT NULL, PRIMARY KEY (id) ); We use the index for the first of these but not the second: select * from foo order by id nulls last; select * from foo order by id nulls first; In an ideal world, we'd like to handle the second one by finding the first non-NULL entry in the index, scanning away from the NULLs, and then, when we run off the end, looping back around to spit out the NULL entries. Or, similarly, if we have an index on (id, name), we can handle the first two of the following with the index, but not the second two: select * from foo order by id, name; select * from foo order by id desc, name desc; select * from foo order by id, name desc; select * from foo order by id, name nulls last; (can't handle this last one even if name is marked NOT NULL!) It seems like it might even be possible to handle these (albeit maybe not real efficiently) via a sufficiently complicated scanning order, but even if we had the code to do the scan, we have no way of telling the index what scan direction we want: forward, backward, and no movement don't really cut it. The idea that forward and backward mean anything at all is really pretty btree-centric. The trick, of course, is to come up with something better. I have a vague notion of using something like an array or list of objects of the form index column, asc/desc, nulls first/last, value (null except for knngist). That could easily be extended with collation or whatever as needed. The trick is - you need to create this object and then ask the index - hey, is this something you can support? And the index needs to then respond by saying whether it can do that (and maybe doing some sort of translation into what instructions should be provided at execution time). Trouble is, that sounds like a lot of work on code I'm not altogether comfortable with, and I'd really like to get KNNGIST in shape for 9.1 if possible. I'm not real sure where to draw the line between, on the one hand, not wanting to commit something that is excessively crufty and, on the other hand, wanting to finish in finite time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Latches with weak memory ordering (Re: max_wal_senders must die)
On Sat, Nov 20, 2010 at 9:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: In practice we do not need to worry about changes made with a kernel call in between, as any sort of context swap will cause the kernel to force cache synchronization. Note that not all kernel calls are equal these days. Some (gettimeofday on Linux) are implemented as very lightweight calls that don't do memory map changes and might not trigger the kinds of synchronization you're talking about. Any IPC kernel calls like kill will though I'm sure. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote: The thing to realize that complicates the design is that the actual sync execution may take a considerable period of time. It's much more likely for that to happen than in the case of an individual write, as the current spread checkpoint does, because those are usually cached. In the spread sync case, it's easy for one slow sync to make the rest turn into ones that fire in quick succession, to make up for lost time. I think the behavior of file systems and operating systems is highly relevant here. We seem to have a theory that allowing a delay between the write and the fsync should give the OS a chance to start writing the data out, I thought that the theory was that doing too many fsync in short order can lead to some kind of starvation of other IO. If the theory is that we want to wait between writes and fsyncs, then the current behavior is probably the best, Spreading out the writes and then doing all the syncs at the end gives the best delay time between an average write and the sync of that written to file. Or, spread the writes out over 150 seconds, sleep for 140 seconds, then do the fsyncs. But I don't think that that is the theory. Well, I've heard Bruce and, I think, possibly also Greg talk about wanting to wait after doing the writes in the hopes that the kernel will start to flush the dirty pages, but I'm wondering whether it wouldn't be better to just give up on that and do: small batch of writes - fsync those writes - another small batch of writes - fsync that batch - etc. but do we have any evidence indicating whether and under what circumstances that actually occurs? For example, if we knew that it's important to wait at least 30 s but waiting 60 s is no better, that would be useful information. Another question I have is about how we're actually going to know when any given fsync can be performed. For any given segment, there are a certain number of pages A that are already dirty at the start of the checkpoint. Dirty in the shared pool, or dirty in the OS cache? OS cache, sorry. Then there are a certain number of additional pages B that are going to be written out during the checkpoint. If it so happens that B = 0, we can call fsync() at the beginning of the checkpoint without losing anything (in fact, we gain something: any pages dirtied by cleaning scans or backend writes during the checkpoint won't need to hit the disk; Aren't those pages written out by cleaning scans and backend writes while the checkpoint is occurring exactly what you defined to be page set B, and then to be zero? No, sorry, I'm referring to cases where all the dirty pages in a segment have been written out the OS but we have not yet issued the necessary fsync. and if the filesystem dumps more of its cache than necessary on fsync, we may as well take that hit before dirtying a bunch more stuff). But if B 0, then we should attempt the fsync() until we've written them all; otherwise we'll end up having to fsync() that segment twice. Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea. For example, given files F1 ... Fn with dirty pages needing checkpoint writes, we could do the following: first, do any pending fsyncs for files not among F1 .. Fn; then, write all pages for F1 and fsync, write all pages for F2 and fsync, write all pages for F3 and fsync, etc. This might seem dumb because we're not really giving the OS a chance to write anything out before we fsync, but think about the ext3 case where the whole filesystem cache gets flushed anyway. It's much better to dump the cache at the beginning of the checkpoint and then again after every file than it is to spew many GB of dirty stuff into the cache and then drop the hammer. But the kernel has knobs to prevent that from happening. dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer kernels), dirty_expire_centisecs. Don't these knobs work? Also, ext3 is supposed to do a journal commit every 5 seconds under default mount conditions. I don't know in detail. dirty_expire_centisecs sounds useful; I think the problem with dirty_background_ratio and dirty_ratio is that the default ratios are large enough that on systems with a huge pile of memory, they allow more dirty data to accumulate than can be flushed without causing an I/O storm. I believe Greg Smith made a comment along the lines of - memory sizes are grow faster than I/O speeds; therefore a ratio that is OK for a low-end system with a modest amount of memory causes problems on a high-end system that has faster I/O but MUCH more memory. As a kernel developer, I suspect the tendency is to try to set the ratio so that you keep enough free memory around to service future allocation requests. Optimizing for possible future fsyncs is probably not the top priority... --
Re: [HACKERS] Spread checkpoint sync
On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote: Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea. For example, given files F1 ... Fn with dirty pages needing checkpoint writes, we could do the following: first, do any pending fsyncs for files not among F1 .. Fn; then, write all pages for F1 and fsync, write all pages for F2 and fsync, write all pages for F3 and fsync, etc. This might seem dumb because we're not really giving the OS a chance to write anything out before we fsync, but think about the ext3 case where the whole filesystem cache gets flushed anyway. It's much better to dump the cache at the beginning of the checkpoint and then again after every file than it is to spew many GB of dirty stuff into the cache and then drop the hammer. But the kernel has knobs to prevent that from happening. dirty_background_ratio, dirty_ratio, dirty_background_bytes (on newer kernels), dirty_expire_centisecs. Don't these knobs work? Also, ext3 is supposed to do a journal commit every 5 seconds under default mount conditions. I don't know in detail. dirty_expire_centisecs sounds useful; I think the problem with dirty_background_ratio and dirty_ratio is that the default ratios are large enough that on systems with a huge pile of memory, they allow more dirty data to accumulate than can be flushed without causing an I/O storm. True, but I think that changing these from their defaults is not considered to be a dark art reserved for kernel hackers, i.e they are something that sysadmins are expected to tweak to suite their work load, just like the shmmax and such. And for very large memory systems, even 1% may be too much to cache (dirty*_ratio can only be set in integer percent points), so recent kernels introduced dirty*_bytes parameters. I like these better because they do what they say. With the dirty*_ratio, I could never figure out what it was a ratio of, and the results were unpredictable without extensive experimentation. I believe Greg Smith made a comment along the lines of - memory sizes are grow faster than I/O speeds; therefore a ratio that is OK for a low-end system with a modest amount of memory causes problems on a high-end system that has faster I/O but MUCH more memory. Yes, but how much work do we want to put into redoing the checkpoint logic so that the sysadmin on a particular OS and configuration and FS can avoid having to change the kernel parameters away from their defaults? (Assuming of course I am correctly understanding the problem, always a dangerous assumption.) Some experiments I have just done show that dirty_expire_centisecs does not seem reliable on ext3, but the dirty*_ratio and dirty*_bytes seem reliable on ext2, ext3, and ext4. But that may not apply to RAID, I don't have one I can test. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
On Sat, Nov 20, 2010 at 01:54:32PM +0900, Itagaki Takahiro wrote: On Sat, Nov 20, 2010 at 03:48, caleb.wel...@emc.com wrote: Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); Hmmm, that means we cannot support multi-array unnest() with our generic aggregate functions. The function prototype might be like below, but we don't support such definition. unnest(anyarray1, anyarray2, ..., OUT anyelement1, OUT anyelement2, ...) RETURNS SETOF record So, we would need a special representation for multi-array unnest(). Using bits we already have, I came up with a way to do the things UNNEST(multiple, arrays, here) WITH ORDINALITY does. At least in theory, this is a matter of silently engaging the rewrite rule system: \set foo ARRAY[1,2,4,8] \set bar ARRAY['Here','is','some','odd','text','of','a','different','length'] \set baz ARRAY['Here','is','yet','more','text'] WITH x AS ( SELECT row_number() OVER () i, foo FROM UNNEST(:foo) foo ), y AS ( SELECT row_number() OVER () i, bar FROM UNNEST(:bar) bar ), z AS ( SELECT row_number() OVER () i, baz FROM UNNEST(:baz) baz ) SELECT * FROM x FULL JOIN y USING(i) FULL JOIN z USING(i); a i | foo |bar| baz ---+-+---+-- 1 | 1 | Here | Here 2 | 2 | is| is 3 | 4 | some | yet 4 | 8 | odd | more 5 | | text | text 6 | | of| 7 | | a | 8 | | different | 9 | | length| (9 rows) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: patch: format function - fixed oid
On Sat, Nov 20, 2010 at 7:29 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/11/20 Robert Haas robertmh...@gmail.com: On Sat, Nov 20, 2010 at 2:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I think so you are a correct. Who will a submit this final version? You or me? I've got it. Thank you nice a day OK, I've committed this, after a fairly heavy rewrite. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name
On Thu, Nov 4, 2010 at 3:39 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Tom Lane t...@sss.pgh.pa.us writes: Not having read the patch, but ... the idea that was in the back of my mind was to have a generic AlterObjectNamespace function that would take parameters approximately like the following: Please find attached what I came up with, that's the set_schema patch version 6. Looks good overall, but: In AlterObjectNamespace(), you reference ACL_KIND_CONVERSION, which I suspect actually needs to be yet another parameter to that function. I've had the thought before that we have a deplorable number of different ways of referring to object types in the back end: ACL_KIND_*, OCLASS_*, OBJECT_*, and class IDs. We should maybe look at unifying some or all of those. getObjectDescriptionOids() needs a prototype somewhere. And probably most significantly, you need to add docs and regression tests. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name
On Sat, Nov 20, 2010 at 11:22 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Nov 4, 2010 at 3:39 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Tom Lane t...@sss.pgh.pa.us writes: Not having read the patch, but ... the idea that was in the back of my mind was to have a generic AlterObjectNamespace function that would take parameters approximately like the following: Please find attached what I came up with, that's the set_schema patch version 6. Looks good overall, but: In AlterObjectNamespace(), you reference ACL_KIND_CONVERSION, which I suspect actually needs to be yet another parameter to that function. I've had the thought before that we have a deplorable number of different ways of referring to object types in the back end: ACL_KIND_*, OCLASS_*, OBJECT_*, and class IDs. We should maybe look at unifying some or all of those. getObjectDescriptionOids() needs a prototype somewhere. And probably most significantly, you need to add docs and regression tests. Ah, nuts. I see now there's a v7. Never mind... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
Thanks for your concerns about my question. I am really happy to get the answers. @Mr. Tom Lane: Thanks for your explanation but I am already using Eclipse which gets me to the definitions real easy. I do not post my questions on the Hackers forum with any intention to disturb anyone. I just ask for help when I am not able to understand it all. PG is my first attempt at OSS development so it is particularly difficult. Of course PG has great support for us because almost everything has got its comment, elaborating its use / importance. However since I don't know everything yet about PG or even about the OSS development, I come here asking. I have never been in so much of code with so many Data Structures and typedefs. It gets confusing very easily. This is the reason I came here asking. I ask here after I have done my homework; and after I get messed up with the code. Moreover the comments are those of developers. So when someone says that expression does not match qual in the code, it is not so clear to me as it might be to you people; because you know the terms, I don't. Thanks for all the support I have been getting here to all of you. I will try to ask as less as possible from now on. @Mr. Dimitri Fontaine: Thanks for telling me the name of the tool. But I like the GUI interfaces more and Eclipse has been serving me well for browsing (only) the source code. I do not fear command line / shell but prefer GUI more. Thanks for your answer. -Vaibhav (*_*) On Sun, Nov 21, 2010 at 2:49 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: Tom Lane t...@sss.pgh.pa.us writes: Here's my single biggest tip for newcomers to the Postgres source: if you don't use ctags, glimpse, or some other tool that can quickly show you all references to a given identifier, go out and get one. It's one of the easiest ways to learn about things. I've been using cscope (out of advice from Joshua Tolley), and even better its integration into Emacs which is called xcscope.el --- I wouldn't have been able to come up with the extension patch series without that. http://cscope.sourceforge.net/ And as a quick teaser, the keys I mostly use: C-c s d cscope-find-global-definition C-c s s cscope-find-this-symbol C-c s f cscope-find-this-file C-c s I cscope-index-files Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
I've got a new stripped down version of the binary json plugin on github: https://github.com/tlaurenzo/pgjson With all due warning of contrived benchmarks, I wrote some tests to see where things stand. The test script is here: https://github.com/tlaurenzo/pgjson/blob/master/testdocs/runbenchmark.sh The results from my laptop (First gen Macbook Pro 32bit with Snow Leopard and Postgresql 9.0) are here and copied into this email at the end: https://github.com/tlaurenzo/pgjson/blob/master/testdocs/benchmark-results-2010-11-20-mbp32.txt And for some commentary... I copied the 5 sample documents from json.org's example section for these tests. These are loaded into a table with a varchar column 1000 times each (so the test table has 5000 rows in it). In all situations, the binary encoding was smaller than the normalized text form (between 9 and 23% smaller). I think there are cases where the binary form will be larger than the corresponding text form, but I don't think they would be very common. For the timings, various operations are performed on the 5000 row test table for 100 iterations. In all situations, the query returns the Length of the transformed document instead of the document itself so as to factor out variable client IO between the test runs. The Null Parse is essentially just a select from the table and therefore represents the baseline. The times varied a little bit between runs but did not change materially. What we see from this is that parsing JSON text and generating a binary representation is cheap, representing approximately 10% of the base case time. Conversely, anything that involves generating JSON text is expensive, accounting for 30-40% of the base case time. Some incidental profiling shows that while the entire operation is expensive, the process of generating string literals dominates this time. There is likely room for optimization in this method, but it should be noted that most of these documents are lightly escaped (if escaped at all) which represents the happy path through the string literal output function. While I have not profiled any advanced manipulation of the binary structure within the server, it stands to reason that manipulating the binary structure should be significantly faster than an approach that requires (perhaps multiple) transcoding between text representations in order to complete a sequence of operations. Assuming that the JSON datatype (at a minimum) normalizes text for storage, then the text storage option accounts for about the most expensive path but with none of the benefits of an internal binary form (smaller size, ability to cheaply perform non-trivial manipulation within the database server). Of course, just having a JSON datatype that blindly stores text will beat everything, but I'm getting closer to thinking that the binary option is worth the tradeoff. Comments? Terry Running benchmark with 100 iterations per step Loading data... Data loaded. === DOCUMENT SIZE STATISTICS === Document Name | Original Size | Binary Size | Normalized Size | Percentage Savings --+---+-+-+ jsonorg_sample1.json | 582 | 311 | 360 | 13.6 jsonorg_sample2.json | 241 | 146 | 183 | 20.2185792349727 jsonorg_sample3.json | 601 | 326 | 389 | 16.1953727506427 jsonorg_sample4.json | 3467 |2466 |2710 | 9.00369003690037 jsonorg_sample5.json | 872 | 469 | 613 | 23.4910277324633 (5 rows) === TEST PARSE AND SERIALIZATION === Null Parse: 12.12 real 2.51 user 0.13 sys Parse to Binary: 13.38 real 2.51 user 0.14 sys Serialize from Binary: 16.65 real 2.51 user 0.13 sys Normalize Text: 18.99 real 2.51 user 0.13 sys Roundtrip (parse to binary and serialize): 18.58 real 2.51 user 0.14 sys
Re: [HACKERS] Fwd: patch: format function - fixed oid
2010/11/21 Robert Haas robertmh...@gmail.com: On Sat, Nov 20, 2010 at 7:29 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/11/20 Robert Haas robertmh...@gmail.com: On Sat, Nov 20, 2010 at 2:59 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I think so you are a correct. Who will a submit this final version? You or me? I've got it. Thank you nice a day OK, I've committed this, after a fairly heavy rewrite. thank you very much regards Pavel Stehule -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers