Re: [HACKERS] Bit data type header reduction in some cases
On 02/25/2014 08:23 AM, Haribabu Kommi wrote: It's regarding a Todo item of Bit data type header reduction in some cases. The header contains two parts. 1) The varlena header is automatically converted to 1 byte header from 4 bytes in case of small data. 2) The bit length header called bit_len to store the actual bit length which is of 4 bytes in size. I want to reduce this bit_len size to 1 byte in some cases as similar to varlena header. With this change the size of the column reduced by 3 bytes, thus shows very good decrease in disk usage. I am planning to the change it based on the total length of bits data. If the number of bits is less than 0x7F then one byte header can be chosen instead of 4 byte header. During reading of the data, the similar way it can be calculated. Since we're designing a new format, how about using bit padding instead of an explicit length field? Add one 1-bit to the end of the bit data, followed by zeros. That's even more compact. See https://en.wikipedia.org/wiki/Padding_%28cryptography%29#Bit_padding The problem I am thinking is, as it can cause problems to old databases having bit columns when they upgrade to a newer version without using pg_dumpall. Is there anyway to handle this problem? Or Is there any better way to handle the problem itself? please let me know your suggestions. On a little-endian system, you could easily use the most-significant bit (sign bit) of the bit_len field to indicate that it's a new-format datum: /* * Little-endian */ typedef struct { int32 vl_len_; int32 bit_len; bits8 bit_dat[1]; } VarBit_Old; typedef struct { int32 vl_len_; uint8 bit_len; bits8 bit_dat[1]; /* var len */ } VarBit_New; #define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len 0x80) On a big-endian system that's more difficult, because the MSB would not be the first byte of the field. You could still make it work if the new format would have the length byte in fourth byte of the Datum. Something like this: /* * Big-endian */ typedef struct { int32 vl_len_; int32 bit_len; bits8 bit_dat[1]; } VarBit_Old; typedef struct { int32 vl_len_; bits8 bit_dat_first_three[3]; uint8 bit_len; bits8 bit_dat_rest[1]; /* var len */ } VarBit_New; #define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len 0x80) That's not a complete solution yet, you also need a special case for a new-format value with less than 4 bytes of bits. Such a field would presumably not have the bit_len field at the fourth byte, because it would be shorter than that, but you could distinguish it by looking at the length in the varsize header. So in big-endian, you'd have one more format for very small varbits: /* Big-endian, less than 4 bytes of bits data */ typedef struct { int32 vl_len_; uint8 bit_len; bits8 bit_dat[1]; /* var len */ } VarBit_New_Tiny; The same basic idea also works with bit padding. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Hello. May I mark this patch as Ready for Committer by myself since this was already marked so in last CF3 and have had no objection or new suggestion in this CF4 for more than a month? At Tue, 14 Jan 2014 18:08:10 +0900, Kyotaro HORIGUCHI wrote Hello, since CF4 is already closed but this patch ramains marked CF3 as 'Ready for Committer', please let me re-post the latest version for CF4 to get rid of vanishing :-p regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype
On 02/25/2014 06:52 AM, Michael Paquier wrote: On Tue, Feb 25, 2014 at 10:02 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: (Not that it's this patch' responsibility to do that.) Definitely a different patch. Don't you think that this would break existing applications? I hope there are no applications relying on pageinspect. It's a very low-level tool. Or if someone has written a nice GUI around it, I'd like to hear about it so that I can start using it :-). I see more flexibility to keep them as they are now, as integers, users can always tune their queries to do this post-processing with to_hex for them as they've (always?) been doing. Agreed. With an int4, you can more easily check for a particular bit etc. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Folks, Let me remind the custom-scan patches; that is a basis feature of remote join of postgres_fdw, cache-only scan, (upcoming) GPU acceleration feature or various alternative ways to scan/join relations. Unfortunately, small amount of discussion we could have in this commit fest, even though Hanada-san volunteered to move the patches into ready for committer state at the CF-Nov. Sorry for jumping into this late. Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. Prior to time-up, I'd like to ask hacker's opinion about its potential arguable points (from my standpoint) if it needs to be fixed up. One is hook definition to add alternative join path, and the other one is a special varno when a custom scan replace a join node. I'd like to see your opinion about them while we still have to change the design if needed. (1) Interface to add alternative paths in addition to built-in join paths This patch adds add_join_path_hook on add_paths_to_joinrel to allow extensions to provide alternative scan path in addition to the built-in join paths. Custom-scan path being added is assumed to perform to scan on a (virtual) relation that is a result set of joining relations. My concern is its arguments to be pushed. This hook is declared as follows: /* Hook for plugins to add custom join path, in addition to default ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist, List *mergeclause_list, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; Likely, its arguments upper than restrictlist should be informed to extensions, because these are also arguments of add_paths_to_joinrel(). However, I'm not 100% certain how about other arguments should be informed. Probably, it makes sense to inform param_source_rels and extra_lateral_rels to check whether the path is sensible for parameterized paths. On the other hand, I doubt whether mergeclause_list is usuful to deliver. (It may make sense if someone tries to implement their own merge-join implementation??) I'd like to seem idea to improve the current interface specification. Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead. BTW, is it a good idea for custom nodes to also affect other paths like append, group etc.? Will it need separate hooks for each of those? (2) CUSTOM_VAR for special Var reference @@ -134,6 +134,7 @@ typedef struct Expr #defineINNER_VAR 65000 /* reference to inner subplan */ #defineOUTER_VAR 65001 /* reference to outer subplan */ #defineINDEX_VAR 65002 /* reference to index column */ +#defineCUSTOM_VAR 65003 /* reference to custom column */ I newly added CUSTOM_VAR to handle a case when custom-scan override join relations. Var-nodes within join plan are adjusted to refer either ecxt_innertuple or ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs instead of built-in joins, because its tuples being fetched are usually stored on the ecxt_scantuple, thus Var-nodes also need to have right varno neither inner nor outer. SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows extensions to rewrite Var-nodes within custom-scan node to indicate ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. For example, a var-node with varno=CUSTOM_VAR and varattno=3 means this node reference the third attribute of the tuple in ecxt_scantuple. I think it is a reasonable solution, however, I'm not 100% certain whether people have more graceful idea to implement it. If you have comments around above two topic, or others, please give your ideas. Thanks, 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
Re: [HACKERS] Trigger information for auto_explain.
Hi, I saw this patch has been moved into committed patches but only the first part (0001_..) for the core is committed as of 32001ab but the rest for extension side seem not to have been committed. Would you mind taking a look on that, Álvaro? regards, At Wed, 22 Jan 2014 17:28:27 +0900, Kyotaro HORIGUCHI wrote me Hello, I came back with doc patch and revised 0002 patch. me meI think documentation is the only thing that stops this patch to be mecommitable... can you add it? me me Agreed. I have pushed patch 0001 for now. me me Thank you, I'll put it sooner. me me I found the default setting for log_triggers was true in the last me patch while writing doc but it's better be false ragarding me backward compatibility. The 0002 patch attached has been changed me there. me me - 0002_auto_explain_triggers_v2_20140122.patch me me default value for log_triggers from 'true' to 'false'. As added me documents says. me me - 0003_auto_explain_triggers_doc_v1_20140122.patch me me documentation. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype
On 25 Feb 2014 08:54, Heikki Linnakangas hlinnakan...@vmware.com wrote: I hope there are no applications relying on pageinspect. It's a very low-level tool. Or if someone has written a nice GUI around it, I'd like to hear about it so that I can start using it :-). I see more flexibility to keep them as they are now, as integers, users can always tune their queries to do this post-processing with to_hex for them as they've (always?) been doing. Agreed. With an int4, you can more easily check for a particular bit etc. What about making it a bit() column? What I would love to see is a function added to page inspect that takes these two fields and returns a list of symbolic names or perhaps a tuple of booleans.
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Sorry for jumping into this late. Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. I'd like to define these features are designed for individual purpose. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I'm motivated to implement GPU acceleration feature that works transparently for application. Thus, it has to be capable on regular tables, because most of application stores data on regular tables, not foreign ones. Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead. Yes. You are also right. But it also makes maintenance overhead if hook has many arguments nobody uses. Probably, it makes sense to list up the arguments that cannot be reproduced from other information, can be reproduced but complicated steps, and can be reproduced easily. Below is the information we cannot reproduce: - PlannerInfo *root - RelOptInfo *joinrel - RelOptInfo *outerrel - RelOptInfo *innerrel - JoinType jointype - SpecialJoinInfo *sjinfo - List *restrictlist Below is the information we can reproduce but complicated steps: - List *mergeclause_list - bool mergejoin_allow - Relids param_source_rels - Relids extra_lateral_rels Below is the information we can reproduce easily: - SemiAntiJoinFactors *semifactors I think, the first two categories or the first category (if functions to reproduce the second group is exposed) should be informed to extension, however, priority of the third group is not high. BTW, is it a good idea for custom nodes to also affect other paths like append, group etc.? Will it need separate hooks for each of those? Yes. I plan to support above plan node, in addition to scan / join only. The custom-scan node is thin abstraction towards general executor behavior, so I believe it is not hard to enhance this node, without new plan node for each of them. Of course, it will need separate hook to add alternative path on the planner stage, but no individual plan nodes. (Sorry, it was unclear for me what does the hook mean.) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com] Sent: Tuesday, February 25, 2014 5:59 PM To: Kohei KaiGai Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Folks, Let me remind the custom-scan patches; that is a basis feature of remote join of postgres_fdw, cache-only scan, (upcoming) GPU acceleration feature or various alternative ways to scan/join relations. Unfortunately, small amount of discussion we could have in this commit fest, even though Hanada-san volunteered to move the patches into ready for committer state at the CF-Nov. Sorry for jumping into this late. Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. Prior to time-up, I'd like to ask hacker's opinion about its potential arguable points (from my standpoint) if it needs to be fixed up. One is hook definition to add alternative join path, and the other one is a special varno when a custom scan replace a
[HACKERS] [ISSUE] pg_dump: schema with OID 0 does not exist
Hi, Recently we observed below errors while taking dump after upgrading from 9.0.13 to 9.1.9. pg_dump: schema with OID 0 does not exist I referred similar issues reported previously ( http://www.postgresql.org/message-id/CAGWYGjXRJj=zugejv0ckvn4zf9hb92q+7e3aqfcvbgbmb9z...@mail.gmail.com) and get issue resolved after identifying and inserting some of the missing rows from *pg_opclass* table. Below are the rows that are missed after upgrade (from *pg_opclass *table): 405 | aclitem_ops | 11 | 10 | 2235 | 1033 | t | 0 783 | box_ops | 11 | 10 | 2593 | 603 | t | 0 783 | point_ops | 11 | 10 | 1029 | 600 | t |603 783 | poly_ops| 11 | 10 | 2594 | 604 | t |603 783 | circle_ops | 11 | 10 | 2595 | 718 | t |603 2742 | _int4_ops | 11 | 10 | 2745 | 1007 | t | 23 2742 | _text_ops | 11 | 10 | 2745 | 1009 | t | 25 2742 | _abstime_ops| 11 | 10 | 2745 | 1023 | t |702 Can some one help me to understand whether it is an issue? If not how and why these rows got disappeared after upgrade? Since we have an fully automated environment we do not encourage manual intervention. So I am trying to understand how can we handle these issues. -- Cheers, Prakash
Re: [HACKERS] often PREPARE can generate high load (and sometimes minutes long unavailability)
Hello 2014-02-24 21:31 GMT+01:00 Jeff Janes jeff.ja...@gmail.com: On Mon, Feb 24, 2014 at 7:02 AM, Pavel Stehule pavel.steh...@gmail.comwrote: 2014-02-23 21:32 GMT+01:00 Andres Freund and...@2ndquadrant.com: Hi, On 2014-02-23 20:04:39 +0100, Pavel Stehule wrote: There is relative few very long ProcArrayLocks lwlocks This issue is very pathologic on fast computers with more than 8 CPU. This issue was detected after migration from 8.4 to 9.2. (but tested with same result on 9.0) I see it on devel 9.4 today actualized. When I moved PREPARE from cycle, then described issues is gone. But when I use a EXECUTE IMMEDIATELY, then the issue is back. So it looks it is related to planner, ... In addition to the issue Jeff mentioned, I'd suggest trying the same workload with repeatable read. That can do *wonders* because of the reduced number of snapshots. I tested it, and it doesn't help. Is there some patch, that I can test related to this issue? This is the one that I was referring to: http://www.postgresql.org/message-id/11927.1384199...@sss.pgh.pa.us I tested this patch with great success. Waiting on ProcArrayLocks are off. Throughput is expected. For described use case it is seriously interesting. Regards Pavel light weight locks lockname mode countavg DynamicLocks Exclusive 8055 5003 DynamicLocks Shared 1666 50 LockMgrLocks Exclusive129 36 IndividualLock Exclusive 34 48 IndividualLock Shared 21 7 BufFreelistLock Exclusive 12 8 WALWriteLock Exclusive 1 38194 ProcArrayLock Shared 1 8 Cheers, Jeff
Re: [HACKERS] inherit support for foreign tables
In addition to an issue pointed out recently by Horiguchi-san, I've found there is another issue we have to discuss. That is, we can't build any parameterized Append paths for an inheritance tree that contains at least one foreign table, in set_append_rel_pathlist(). This is because the patch doesn't take into consideration reparameterization for paths for a foreign table, which attempts to modify these paths to have greater parameterization so that each child path in the inheritance tree can have the exact same parameterization. To do so, I think we would need to add code for the foreign-table case to reparameterize_path(). And I think we should introduce a new FDW routine, say ReparameterizeForeignPath() because the processing would be performed best by the FDW itself. Comments are welcome! Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Sorry for jumping into this late. Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. If one wants to push joins, aggregates, grouping across to other data sources capable of handling them, that will need to change. But, at the same time, letting custom scan node being able to decide that doesn't seem to be a very good idea. Although, through custom scan nodes, we can see the potential in adding these features. I'd like to define these features are designed for individual purpose. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I'm motivated to implement GPU acceleration feature that works transparently for application. Thus, it has to be capable on regular tables, because most of application stores data on regular tables, not foreign ones. It looks like my description was misleading. In some cases, it might be possible that the ultimate functionality that a particular instantiation of custom node scan is already available as a Plan node in PG, but PG optimizer is not able to optimize the operation that way. In such case, custom scan node infrastructure should produce the corresponding Path node and not implement that functionality itself. Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead. Yes. You are also right. But it also makes maintenance overhead if hook has many arguments nobody uses. Probably, it makes sense to list up the arguments that cannot be reproduced from other information, can be reproduced but complicated steps, and can be reproduced easily. Below is the information we cannot reproduce: - PlannerInfo *root - RelOptInfo *joinrel - RelOptInfo *outerrel - RelOptInfo *innerrel - JoinType jointype - SpecialJoinInfo *sjinfo - List *restrictlist Most of this information is available through corresponding RelOptInfo, or we should make RelOptInfo contain all the information related to every relation required to be computed during the query. So, any function which creates paths can just take that RelOptInfo as an argument and produce the path/s. That way there is lesser chance that the function signatures change. Below is the information we can reproduce but complicated steps: - List *mergeclause_list - bool mergejoin_allow - Relids param_source_rels - Relids extra_lateral_rels Below is the information we can reproduce easily: - SemiAntiJoinFactors *semifactors I think, the first two categories or the first category (if functions to reproduce the second group is exposed) should be informed to extension, however, priority of the third group is not high. BTW, is it a good idea for custom nodes to also affect other paths like append, group etc.? Will it need separate hooks for each of those? Yes. I plan to support above plan node, in addition to scan / join only. The custom-scan node is thin abstraction towards general executor behavior, so I believe it is not hard to enhance this node, without new plan node for each of them. Of course, it will need separate hook to add alternative path on the planner stage, but no individual plan nodes. (Sorry, it was unclear for me what does the hook mean.) If we represent all the operation like grouping, sorting, aggregation, as some sort of relation, we can create paths for each of the relation like we do (I am heavily borrowing from Tom's idea of pathifying those operations). We will need much lesser hooks in custom scan node. BTW, from the patch, I do not see this change to be light
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Feb24, 2014, at 17:50 , Dean Rasheed dean.a.rash...@gmail.com wrote: On 20 February 2014 01:48, Florian Pflug f...@phlo.org wrote: On Jan29, 2014, at 13:45 , Florian Pflug f...@phlo.org wrote: In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM() aggregates that are affected by this, I think. I finally got around to doing that, and the results aren't too bad. The attached patches required that the strictness settings of the forward and reverse transition functions agree, and employ exactly the same NULL-skipping logic we always had. The only aggregates seriously affected by that change were SUM(int2) and SUM(int4). I haven't looked at this in any detail yet, but that seems much neater to me. It seems perfectly sensible that the forward and inverse transition functions should have the same strictness settings, and enforcing that keeps the logic simple, as well as hopefully making it easier to document. Good to hear that you agree! I'll try to find some time to update the docs. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] typemode for variable types
Also keep in mind that the system doesn't always retain the typmod. So the datum should be possible to interpret without the typmod. Incidental effects such as length limits or precision displayed are ok but the meaning shouldn't be changed. -- greg On 24 Feb 2014 20:34, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Bruce Momjian escribió: Well, typmods are type-specific, so there is no official way to calculate it. I would look at how an existing type uses typmod and copy that. Our system is pretty neat. See a complex example here: https://github.com/postgis/postgis/blob/svn-trunk/postgis/gserialized_typmod.c One other point is that if you do consult the varchar functions as an example, be aware that there's an offset of 4 in their definition of the typmod (eg, for varchar(3) the stored typmod is 7). This is entirely for legacy reasons so there's no good reason to duplicate it in a new custom-made type. Except for the rule that negative values mean unspecified typmod (which you have to support), you can define the contents of the typmod value however you want. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Same double precision operations, different results
Thx Tom!! Yep, I focused on all trigonometric functions take arguments and return values of type double precision. Looking for the error I lost focus on numeric values directly on the select and diff inner calculations. I think best approach will be maintain double precision on trigonometric calculations for faster and cast to numeric before acos operation. Thank you man! 2014-02-13 18:26 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: =?ISO-8859-1?Q?Daniel_V=E1zquez?= daniel2d2...@gmail.com writes: Please can someone tell me why this behavior? Thx!! You're confusing numeric and double precision; in particular this calculation is *not* being done in float8, but numeric: # select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 * 0.642555686986733) calc; (*) Why this calculation produces 1 and not 0.999633651488135693 Actually, it's not producing 1, but a smidgen more: regression=# set extra_float_digits TO 3; SET regression=# select ( cast ( cos(radians(39.9826557)) * cos(radians(39.9826557)) * cos(radians(-0.0477312004383) - radians(-0.0477312004383)) + sin(radians(39.9826557)) * sin(radians(39.9826557)) as double precision ) ); float8 - 1.00022 (1 row) You've got roundoff error either way, but this way happens to be in the direction that makes acos() complain. regards, tom lane
Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins
Hi, On 01/28/2014 06:46 PM, Atri Sharma wrote: On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown t...@linux.com mailto:t...@linux.com wrote: Hi all, Application to Google Summer of Code 2014 can be made as of next Monday (3rd Feb), and then there will be a 12 day window in which to submit an application. I'd like to gauge interest from both mentors and students as to whether we'll want to do this. And I'd be fine with being admin again this year, unless there's anyone else who would like to take up the mantle? Who would be up for mentoring this year? And are there any project ideas folk would like to suggest? I would like to bring up the addition to MADLIB algorithms again this year. I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak
Hello, All! eshkin...@gmail.com writes: create function perl_test(IN data text, OUT v1 text, OUT v2 integer, OUT v3 integer, OUT v4 json, OUT v5 json) returns record as $BODY$ use strict; use warnings; my $res-{'v1'} = 'content'; return $res; $BODY$ language plperlu volatile strict; test case: select count(perl_test('')) from generate_series(1, 100); It looks like I found the problem, Perl use reference count and something that is called Mortal for memory management. As I understand it, mortal is free after FREETMPS. Plperl call FREETMPS in plperl_call_perl_func() but after it, plperl ask perl interpreter again for new mortal SV variables, for example, in hek2cstr from plperl_sv_to_datum, and this new SV is newer freed. I experiment with this patch, and it fix memory leak in my case, but patch is incomplete. It does not free on error and fix only plperl_func_handler, plperl_trigger_handler and may be plperl_inline_handler must be also fixed. Patch for REL9_2_STABLE. without patch: PIDVSZ RSS 2503 74112 7740 2503 152928 86860 2503 232208 165836 2503 310732 244508 2503 389264 323032 with patch: PIDVSZ RSS 4322 74112 7740 4322 74380 8340 4322 74380 8340 4322 74380 8340 4322 74380 8340 diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 49d50c4..9c9874d 100644 --- a/src/pl/plperl/plperl.c +++ b/src/pl/plperl/plperl.c @@ -2173,6 +2173,9 @@ plperl_func_handler(PG_FUNCTION_ARGS) ReturnSetInfo *rsi; ErrorContextCallback pl_error_context; + ENTER; + SAVETMPS; + if (SPI_connect() != SPI_OK_CONNECT) elog(ERROR, could not connect to SPI manager); @@ -2271,6 +2274,9 @@ plperl_func_handler(PG_FUNCTION_ARGS) SvREFCNT_dec(perlret); + FREETMPS; + LEAVE; + return retval; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
2014-02-25 20:32 GMT+09:00 Ashutosh Bapat ashutosh.ba...@enterprisedb.com: On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Sorry for jumping into this late. Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. If one wants to push joins, aggregates, grouping across to other data sources capable of handling them, that will need to change. But, at the same time, letting custom scan node being able to decide that doesn't seem to be a very good idea. Although, through custom scan nodes, we can see the potential in adding these features. Of course, existing form of custom-scan node is designed to support scan or join relations, as a first step. It will also need some enhancement to support other class of execution node in the future version. I'm not certain why it is problematic. I'd like to define these features are designed for individual purpose. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I'm motivated to implement GPU acceleration feature that works transparently for application. Thus, it has to be capable on regular tables, because most of application stores data on regular tables, not foreign ones. It looks like my description was misleading. In some cases, it might be possible that the ultimate functionality that a particular instantiation of custom node scan is already available as a Plan node in PG, but PG optimizer is not able to optimize the operation that way. In such case, custom scan node infrastructure should produce the corresponding Path node and not implement that functionality itself. You are suggesting that CustomSort, CustomAgg, CustomAppend and so on should be supported in the future version, for better integration with the plan optimizer. Right? It is probably a good idea if optimizer needs to identify Custom node using node tag, rather than something others like custom-scan provider name, Right now, custom-scan feature focuses on optimization of relation scan and join as its first scope, and does not need to identify the class of corresponding Path node. On the upthread of this discussion, I initially proposed to have separated CustomScan and CustomJoin node, however, our consensus was that CustomScan can perform as like a scan on the result set of joined relations, so I dropped multiple node types from the first version. Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead. Yes. You are also right. But it also makes maintenance overhead if hook has many arguments nobody uses. Probably, it makes sense to list up the arguments that cannot be reproduced from other information, can be reproduced but complicated steps, and can be reproduced easily. Below is the information we cannot reproduce: - PlannerInfo *root - RelOptInfo *joinrel - RelOptInfo *outerrel - RelOptInfo *innerrel - JoinType jointype - SpecialJoinInfo *sjinfo - List *restrictlist Most of this information is available through corresponding RelOptInfo, or we should make RelOptInfo contain all the information related to every relation required to be computed during the query. So, any function which creates paths can just take that RelOptInfo as an argument and produce the path/s. That way there is lesser chance that the function signatures change. Uhmm It is inconvenience to write extensions. I want the variables in the first and second groups being delivered to the hook, even though it may have minor modification in the future release. Relations join is one of the heart of RDBMS, so I'd like
Re: [HACKERS] Bit data type header reduction in some cases
Heikki Linnakangas hlinnakan...@vmware.com writes: On 02/25/2014 08:23 AM, Haribabu Kommi wrote: It's regarding a Todo item of Bit data type header reduction in some cases. The header contains two parts. 1) The varlena header is automatically converted to 1 byte header from 4 bytes in case of small data. 2) The bit length header called bit_len to store the actual bit length which is of 4 bytes in size. I want to reduce this bit_len size to 1 byte in some cases as similar to varlena header. With this change the size of the column reduced by 3 bytes, thus shows very good decrease in disk usage. [ various contorted schemes for doing that backward-compatibly ] TBH, this sounds like a huge amount of effort and a significant risk of new bugs in return for not darn much. Who uses bit values anyway? They were removed from the SQL spec more than 10 years ago. And of that population, who cares about a byte or two per value? The field demand for this is not only zero, it's probably negative. (The thread referenced by the TODO entry shows no evidence of user demand.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins
Awesome. I can be an assistant mentor for this one is possible or I could mentor some other project. On Tuesday, February 25, 2014, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote: Hi, On 01/28/2014 06:46 PM, Atri Sharma wrote: On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown t...@linux.com mailto:t...@linux.com wrote: Hi all, Application to Google Summer of Code 2014 can be made as of next Monday (3rd Feb), and then there will be a 12 day window in which to submit an application. I'd like to gauge interest from both mentors and students as to whether we'll want to do this. And I'd be fine with being admin again this year, unless there's anyone else who would like to take up the mantle? Who would be up for mentoring this year? And are there any project ideas folk would like to suggest? I would like to bring up the addition to MADLIB algorithms again this year. I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project -- Regards, Atri *l'apprenant*
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote: On 02/24/2014 11:06 AM, Merlin Moncure wrote: On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote: I still find the phrasing as jsonb is more efficient for most purposes to be a bit off Basically, the text json type is faster for serialization/deserialization pattern (not just document preservation) and jsonb is preferred when storing json and doing repeated subdocument accesses. Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. [sorry for noise on this]. Here's the use case coverage as I see it today: CASE:jsonjsonb hstore Static document: yes poor poor Precise document:yes nono Serialization: yes nono Deserialization: poor*** yes* no Repeated Access: poor yes yes Manipulation:no no** yes GIST/GIN searching: no no** yes notes: * jsonb gets 'yes' for deserialization assuming andrew's 'two level' deserialization fix goes in (otherwise 'poor'). ** jsonb can't do this today, but presumably will be able to soon *** 'poor' unless json type also gets the deserialization fix, then 'yes'. hstore can deserialize hstore format, but will rely on json/jsonb for deserializing json 'Static document' represents edge cases where the json is opaque to the database but performance -- for example large map polygons. 'Precise document' represents cases where whitespace or key order is important. Peter asked upthread how to access the various features. Well, today, it basically means a bit of nimble casting to different structures depending on which particular features are important to you, which IMNSHO is not bad at all as long as we understand that most people who rely on jsonb will also need hstore for its searching and operators. Down the line when hstore and jsonb are more flushed out it's going to come down to an API style choice. Frankly, a lot of the above doesn't make much sense to me. WTF is Manipulation'? Unless I see much more actual info on the tests being conducted it's just about impossible to comment. The performance assessment at this stage is simply anecdotal as far as I'm concerned. Er, I wasn't making performance assessments (except in cases where it was obvious like poor support for arbitrary access with json) , but API coverage of use cases. Manipulation I thought obvious: the ability to manipulate the document (say, change some value to something else): the nosql pattern. through the API. Neither json or jsonb can do that at present...only hstore can. jsonb cant't; it only covers some of what json type currently covers (but some of the thing it does cover is much faster). On Mon, Feb 24, 2014 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote: Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. Aside from that, I want our docs to make a strong endorsement of using jsonb over json for most users. jsonb will continue to be developed and improved in the future; it is very unlikely that json will. Maybe that's what I should say rather than anything about efficiency. I would hope that endorsement doesn't extend to misinforming users. Moreover, json type is handling all serialization at present and will continue to do so for some years. In fact, in this release we got a bunch of new very necessary enhancements (json_build) to serialization! You're trying to deprecate and enhance the type at the same time! The disconnect here is that your statements would be correct if the only usage for the json type would be for storing data in json. However, people (including myself) are doing lots of wonderful things storing data in the traditional way and moving into and out of json in queries and that, besides working better in the json type, is only possible in json. That might change in the future by figuring out a way to cover json serialization cases through jsonb but that's not how things work today, end of story. Look, I definitely feel the frustration and weariness here in terms of my critiquing the proposed API along with the other arguments I've made. Please understand that nobody wants this to go out the door more than me if the objective is to lock in the API 'as is' then let's be polite to our users and try to document various use cases and what's good at what. merlin -- Sent via pgsql-hackers mailing list
Re: [HACKERS] often PREPARE can generate high load (and sometimes minutes long unavailability)
2014-02-25 11:29 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: Hello 2014-02-24 21:31 GMT+01:00 Jeff Janes jeff.ja...@gmail.com: On Mon, Feb 24, 2014 at 7:02 AM, Pavel Stehule pavel.steh...@gmail.comwrote: 2014-02-23 21:32 GMT+01:00 Andres Freund and...@2ndquadrant.com: Hi, On 2014-02-23 20:04:39 +0100, Pavel Stehule wrote: There is relative few very long ProcArrayLocks lwlocks This issue is very pathologic on fast computers with more than 8 CPU. This issue was detected after migration from 8.4 to 9.2. (but tested with same result on 9.0) I see it on devel 9.4 today actualized. When I moved PREPARE from cycle, then described issues is gone. But when I use a EXECUTE IMMEDIATELY, then the issue is back. So it looks it is related to planner, ... In addition to the issue Jeff mentioned, I'd suggest trying the same workload with repeatable read. That can do *wonders* because of the reduced number of snapshots. I tested it, and it doesn't help. Is there some patch, that I can test related to this issue? This is the one that I was referring to: http://www.postgresql.org/message-id/11927.1384199...@sss.pgh.pa.us I tested this patch with great success. Waiting on ProcArrayLocks are off. Throughput is expected. For described use case it is seriously interesting. Here is a update for 9.4 Regards Pavel Regards Pavel light weight locks lockname mode countavg DynamicLocks Exclusive 8055 5003 DynamicLocks Shared 1666 50 LockMgrLocks Exclusive129 36 IndividualLock Exclusive 34 48 IndividualLock Shared 21 7 BufFreelistLock Exclusive 12 8 WALWriteLock Exclusive 1 38194 ProcArrayLock Shared 1 8 Cheers, Jeff diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c new file mode 100644 index d525ca4..261473d *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *** get_actual_variable_range(PlannerInfo *r *** 4958,4963 --- 4958,4964 HeapTuple tup; Datum values[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + SnapshotData SnapshotDirty; estate = CreateExecutorState(); econtext = GetPerTupleExprContext(estate); *** get_actual_variable_range(PlannerInfo *r *** 4980,4985 --- 4981,4987 slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel)); econtext-ecxt_scantuple = slot; get_typlenbyval(vardata-atttype, typLen, typByVal); + InitDirtySnapshot(SnapshotDirty); /* set up an IS NOT NULL scan key so that we ignore nulls */ ScanKeyEntryInitialize(scankeys[0], *** get_actual_variable_range(PlannerInfo *r *** 4997,5003 if (min) { index_scan = index_beginscan(heapRel, indexRel, ! GetActiveSnapshot(), 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in sortop's direction */ --- 4999,5005 if (min) { index_scan = index_beginscan(heapRel, indexRel, ! SnapshotDirty, 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in sortop's direction */ *** get_actual_variable_range(PlannerInfo *r *** 5029,5035 if (max have_data) { index_scan = index_beginscan(heapRel, indexRel, ! GetActiveSnapshot(), 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in reverse direction */ --- 5031,5037 if (max have_data) { index_scan = index_beginscan(heapRel, indexRel, ! SnapshotDirty, 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in reverse direction */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 1/30/14, 2:28 AM, Christian Kruse wrote: after I finally got documentation compilation working I updated the patch to be syntactically correct. You will find it attached. I don't think we should be explaining the basics of OS memory management in our documentation. And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. I think this patch should be cut down to the paragraphs that cover the actual configuration. On a technical note, use xref instead of link for linking. doc/src/sgml/README.links contains some information. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walsender doesn't send keepalives when writes are pending
On 2014-02-25 10:45:55 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote: In WalSndLoop() we do: wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT | WL_SOCKET_READABLE; if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITEABLE; else if (wal_sender_timeout 0 !ping_sent) { ... if (GetCurrentTimestamp() = timeout) WalSndKeepalive(true); ... I think those two if's should simply be separate. There's no reason not to ask for a ping when we're writing. On a busy server that might be the case most of the time. The if (pq_is_send_pending()) should also be *after* sending the keepalive, after all, it might not immediately have been flushed as unlikely as that is.ackers I spent an inordinate amount of time looking at this patch. I'm not sure your proposed change will accomplish the desired effect. The thing is that the code you quote is within an if-block gated by (caughtup !streamingDoneSending) || pq_is_send_pending(). Currently, therefore, the behavior is that we wait on the latch *either when* we're caught up (and thus need to wait for more WAL) or when the outbound queue is full (and thus we need to wait for it to drain), but we send a keep-alive only in the former case (namely, we're caught up). Your proposed change would cause us to send keep-alives when we're caught up, or when we're not caught up but the write queue happens to be full. That doesn't make much sense. There might be a reason to start sending keep-alives when we're not caught up, but even if we want that behavior change there's no reason to do it only when the write queue is full. I am not sure I can follow. Why doesn't it make sense to send out the keepalive (with replyRequested = true) when we're busy sending stuff (which will be the case most of the time on a busy server)? The point is that clients like pg_receivexlog and pg_basebackup don't send an keepalive response all the time they receive something (which is perfectly fine), but just when their internal timer says it's time to do so, or if the walsender asks them to do so. So, if the server has a *lower* timeout than configured for pg_receivexlog and the server is a busy one, you'll get timeouts relatively often. This is a pretty frequent situation in synchronous replication setups because the default timeout is *way* too high for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walsender doesn't send keepalives when writes are pending
On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote: In WalSndLoop() we do: wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT | WL_SOCKET_READABLE; if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITEABLE; else if (wal_sender_timeout 0 !ping_sent) { ... if (GetCurrentTimestamp() = timeout) WalSndKeepalive(true); ... I think those two if's should simply be separate. There's no reason not to ask for a ping when we're writing. On a busy server that might be the case most of the time. The if (pq_is_send_pending()) should also be *after* sending the keepalive, after all, it might not immediately have been flushed as unlikely as that is.ackers I spent an inordinate amount of time looking at this patch. I'm not sure your proposed change will accomplish the desired effect. The thing is that the code you quote is within an if-block gated by (caughtup !streamingDoneSending) || pq_is_send_pending(). Currently, therefore, the behavior is that we wait on the latch *either when* we're caught up (and thus need to wait for more WAL) or when the outbound queue is full (and thus we need to wait for it to drain), but we send a keep-alive only in the former case (namely, we're caught up). Your proposed change would cause us to send keep-alives when we're caught up, or when we're not caught up but the write queue happens to be full. That doesn't make much sense. There might be a reason to start sending keep-alives when we're not caught up, but even if we want that behavior change there's no reason to do it only when the write queue is full. At least, that's how it looks to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 2014-02-25 10:29:32 -0500, Peter Eisentraut wrote: On 1/30/14, 2:28 AM, Christian Kruse wrote: after I finally got documentation compilation working I updated the patch to be syntactically correct. You will find it attached. I don't think we should be explaining the basics of OS memory management in our documentation. Agreed. And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. Is it actually? A quick comparison doesn't show that many similarities? Christian? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 25/02/14 10:29, Peter Eisentraut wrote: I don't think we should be explaining the basics of OS memory management in our documentation. Well, I'm confused. I thought that's exactly what has been asked. And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. I didn't. This is a write-up of several articles, blog posts and documentation I read about this topic. However, if you think the texts are too similar, then we should add a note, yes. Didn't mean to copy w/o referring to a source. I think this patch should be cut down to the paragraphs that cover the actual configuration. I tried to cover the issues Heikki brought up in 52861eec.2090...@vmware.com. On a technical note, use xref instead of link for linking. doc/src/sgml/README.links contains some information. OK, I will post an updated patch later this evening. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpfzdmiy8T7D.pgp Description: PGP signature
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Hi, On 25/02/14 17:01, Andres Freund wrote: And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. Is it actually? A quick comparison doesn't show that many similarities? Christian? Not as far as I know. But of course, as I wrote the text I _also_ (that's not my only source) read the Debian article and I was influenced by it. It may be that the texts are more similar then I thought, although I still don't see it. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgplDAVez7E7u.pgp Description: PGP signature
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 12:31 PM, Josh Berkus j...@agliodbs.com wrote: Aside from that, I want our docs to make a strong endorsement of using jsonb over json for most users. jsonb will continue to be developed and improved in the future; it is very unlikely that json will. Maybe that's what I should say rather than anything about efficiency. In other words: having an ambiguous, complex evaluation of json vs. jsonb does NOT benefit most users. The result will be some users choosing json and then pitching fit when they want jsonb in 9.5 and have to rewrite all their tables. Mind you, we'll need to fix the slow deserialization, though. I think you've got your head stuck deeply in the sand. The json data type works exactly like the xml data type has always worked. There have been occasional noises about making an xmlb data type, but nobody's minded enough to do anything about it, or at least not in this forum. So if the json data type has no future and is crap, then the same presumably holds of the xml data type. But I don't think anyone here believes that, unless they just hate xml on general principle, which I can certainly understand. You really *can't* fix the fact that jsonb takes longer to (deserialize than json. I mean, it's possible the code can be optimized. But since json is stored in the exact format in which it is to be emitted, the output function is basically just memcpy(). You're never going to get that kind of speed out of code that actually has to do something, and I suspect you're going to find that it's hard to come close. In short, I think you're viewing everything about jsonb with rose-colored glasses on, and that your enthusiasm is mostly wishful thinking. Will there be good things about jsonb? Of course. Will lots of people want to use it for those reasons? Very likely. Will it be better than json in all ways and for all purposes? No, and implying the contrary is just plain wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walsender doesn't send keepalives when writes are pending
On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com wrote: I am not sure I can follow. Why doesn't it make sense to send out the keepalive (with replyRequested = true) when we're busy sending stuff (which will be the case most of the time on a busy server)? It may very well make sense, but your patch won't generally have that effect, because with the patch you proposed, the keep-alive can only be sent when the server is busy if the write queue is also full. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walsender doesn't send keepalives when writes are pending
On 2014-02-25 11:15:46 -0500, Robert Haas wrote: On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com wrote: I am not sure I can follow. Why doesn't it make sense to send out the keepalive (with replyRequested = true) when we're busy sending stuff (which will be the case most of the time on a busy server)? It may very well make sense, but your patch won't generally have that effect, because with the patch you proposed, the keep-alive can only be sent when the server is busy if the write queue is also full. Well, it either needs to be caughtup *or*/and have a busy write queue, right? Usually that state will be reached very quickly because before that we're writing data to the network as fast as it can be read from disk. Also, there's no timeout checks outside that if (caughtup || send_pending()) block, so there's not much of a window to hit problems when that loop isn't entered. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walsender doesn't send keepalives when writes are pending
On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-25 11:15:46 -0500, Robert Haas wrote: On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com wrote: I am not sure I can follow. Why doesn't it make sense to send out the keepalive (with replyRequested = true) when we're busy sending stuff (which will be the case most of the time on a busy server)? It may very well make sense, but your patch won't generally have that effect, because with the patch you proposed, the keep-alive can only be sent when the server is busy if the write queue is also full. Well, it either needs to be caughtup *or*/and have a busy write queue, right? Right. Usually that state will be reached very quickly because before that we're writing data to the network as fast as it can be read from disk. I'm unimpressed. Even if that is in practice true, making the code self-consistent is a goal of non-trivial value. The timing of sending keep-alives has no business depending on the state of the write queue, and right now it doesn't. Your patch would make it depend on that, mostly by accident AFAICS. Also, there's no timeout checks outside that if (caughtup || send_pending()) block, so there's not much of a window to hit problems when that loop isn't entered. That might be true, but waiting until the write queue is full to send a ping and then checking whether we've timed out before the queue has drained isn't a sane design pattern. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Mon, Feb 24, 2014 at 1:05 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Feb 17, 2014 at 11:14:33AM -0500, Bruce Momjian wrote: On Sun, Feb 16, 2014 at 09:26:47PM -0500, Robert Haas wrote: So, would anyone like me to create patches for any of these items before we hit 9.4 beta? We have added autovacuum_work_mem, and increasing work_mem and maintenance_work_mem by 4x is a simple operation. Not sure about the others. Or do we just keep this all for 9.5? I don't think anyone objected to increasing the defaults for work_mem and maintenance_work_mem by 4x, and a number of people were in favor, so I think we should go ahead and do that. If you'd like to do the honors, by all means! OK, patch attached. Patch applied. Thanks! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski rukh.me...@yahoo.ca wrote: Sorry, I wanted to minimize the attention my message attracts. I mostly posted it to let people know I plan on working on this for 9.5 to avoid duplicated effort. I will post more documentation and my reasons for wanting this feature in postgre later, if that's all right. I've wanted this more than once. I suspect it's a pretty hard project, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] varchar_transform
On Mon, Feb 24, 2014 at 2:11 AM, Mohsen SM mohsensoodk...@gmail.com wrote: when did use varchar_transform function? src/backend/uitls/adt/varchar.c. The main point of that machinery is that if you do something like ALTER TABLE zot ALTER COLUMN thunk SET DATA TYPE varchar(30) on a column that is currently varchar(20), it won't feel obliged to rewrite the table. That function lets it figure out that anything that's a valid varchar(20) value is also a valid varchar(30) value. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 2/25/14, 11:08 AM, Christian Kruse wrote: Hi, On 25/02/14 17:01, Andres Freund wrote: And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. Is it actually? A quick comparison doesn't show that many similarities? Christian? Not as far as I know. But of course, as I wrote the text I _also_ (that's not my only source) read the Debian article and I was influenced by it. It may be that the texts are more similar then I thought, although I still don't see it. I suspect that it was done subconsciously. But I did notice it right away, so there is something to it. As I mentioned, I would just cut those introductory parts out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
On Mon, Feb 24, 2014 at 2:49 PM, Christian Kruse christ...@2ndquadrant.com wrote: attached you will find a fixed version. Committed, after fixing the regression test outputs. I also changed the new fields to be NULL rather than 0 when they are invalid, because that way applying age() to that column does something useful instead of something lame. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 08:13 AM, Robert Haas wrote: I think you've got your head stuck deeply in the sand. The json data type works exactly like the xml data type has always worked. There have been occasional noises about making an xmlb data type, but nobody's minded enough to do anything about it, or at least not in this forum. So if the json data type has no future and is crap, then the same presumably holds of the xml data type. But I don't think anyone here believes that, unless they just hate xml on general principle, which I can certainly understand. Well, if we had an XMLB, I would in fact be making the same argument. I'll point out the only reason we're keeping the original json instead of forcing an upgrade to jsonb, per earlier discussions, is backwards-compatibility. If we had never had a json-text, and Merlin was proposing adding one now alongside jsonb, I'd be arguing against doing so. In short, I think you're viewing everything about jsonb with rose-colored glasses on, and that your enthusiasm is mostly wishful thinking. Will there be good things about jsonb? Of course. Will lots of people want to use it for those reasons? Very likely. Will it be better than json in all ways and for all purposes? No, and implying the contrary is just plain wrong. It hurts our adoption substantially to confuse developers. We need to recommend one type over the other, hence Use jsonb unless you need X. Merlin is pushing the type of multivariable comparison where *I* wouldn't be able to make sense of which one I should pick, let alone some web developer who's just trying to get a site built. That sort of thing *really* doesn't help our users. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 09:38:50AM -0800, Josh Berkus wrote: In short, I think you're viewing everything about jsonb with rose-colored glasses on, and that your enthusiasm is mostly wishful thinking. Will there be good things about jsonb? Of course. Will lots of people want to use it for those reasons? Very likely. Will it be better than json in all ways and for all purposes? No, and implying the contrary is just plain wrong. It hurts our adoption substantially to confuse developers. We need to recommend one type over the other, hence Use jsonb unless you need X. Merlin is pushing the type of multivariable comparison where *I* wouldn't be able to make sense of which one I should pick, let alone some web developer who's just trying to get a site built. That sort of thing *really* doesn't help our users. I agree it would be nice to have something simple, like Use JSON if you wish to just store/retrieve entire JSON structures, and JSONB if you wish to do any kind of lookup or manipulation of JSON values on the server. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote: On 2/25/14, 11:08 AM, Christian Kruse wrote: Hi, On 25/02/14 17:01, Andres Freund wrote: And if we did, we shouldn't copy it verbatim from the Debian wiki without attribution. Is it actually? A quick comparison doesn't show that many similarities? Christian? Not as far as I know. But of course, as I wrote the text I _also_ (that's not my only source) read the Debian article and I was influenced by it. It may be that the texts are more similar then I thought, although I still don't see it. I suspect that it was done subconsciously. But I did notice it right away, so there is something to it. As I mentioned, I would just cut those introductory parts out. Should we link to the Debian wiki content? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Avoiding deeply nested AND/OR trees in the parser
Over in http://www.postgresql.org/message-id/bay176-w382a9de827ebc8e602b7bbc5...@phx.gbl there's a complaint about getting stack depth limit exceeded from a query of the form select count(*) from gui_die_summary where (x_coord, y_coord) in ((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...); when there's a few thousand pairs in the IN list. The reason for this is that transformAExprIn() generates a tree of nested OR expressions, and then recursion in assign_collations_walker() runs out of stack space. (If assign_collations_walker() hadn't failed, something else probably would later, so it's wrong to blame that function in particular.) I reproduced this problem in the regression database using generated queries like so: print explain select * from tenk1 where (thousand, tenthous) in (\n; for ($i = 0; $i 1; $i++) { print ,\n if $i 0; print ($i,$i); } print );\n; On my machine, HEAD fails at about 9000 pairs with this test case. While I commented in the pgsql-novice thread that there are better ways to do this, it still seems like a limitation we probably ought to fix if it's not too hard. In the case of transformAExprIn, we could generate an N-argument OR or AND node instead of a nest; this is already done for example in make_row_comparison_op(). The attached quick-hack patch does so, and I verified that the system could handle a million pairs with this in place. (It takes about 20 seconds and 20GB of memory to plan such a case, so I still say it's a bad idea, but at least we can do it.) There is similar code in make_row_distinct_op(), which perhaps ought to be fixed as well. However, this isn't exactly the end of the story, because if you were to dump out a view generated from a query like this, it would contain a long chain of OR clauses, which would mean that reloading the view would put you right back in stack overflow territory. Really if we wanted to fix this issue we'd need to hack up transformAExprAnd/transformAExprOr so that they recognized nested ANDs/ORs and flattened them on the spot. This might not be a bad idea, but it's starting to look like more than a quick hack patch. Does this seem worth pursuing, or shall we leave it alone? regards, tom lane diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 81c9338..9550bd1 100644 *** a/src/backend/parser/parse_expr.c --- b/src/backend/parser/parse_expr.c *** transformAExprOf(ParseState *pstate, A_E *** 1088,1094 static Node * transformAExprIn(ParseState *pstate, A_Expr *a) { ! Node *result = NULL; Node *lexpr; List *rexprs; List *rvars; --- 1088,1095 static Node * transformAExprIn(ParseState *pstate, A_Expr *a) { ! Node *result; ! List *cmpexprs = NIL; Node *lexpr; List *rexprs; List *rvars; *** transformAExprIn(ParseState *pstate, A_E *** 1166,1171 --- 1167,1173 */ List *aexprs; ArrayExpr *newa; + Node *cmp; aexprs = NIL; foreach(l, rnonvars) *** transformAExprIn(ParseState *pstate, A_E *** 1185,1196 newa-multidims = false; newa-location = -1; ! result = (Node *) make_scalar_array_op(pstate, ! a-name, ! useOr, ! lexpr, ! (Node *) newa, ! a-location); /* Consider only the Vars (if any) in the loop below */ rexprs = rvars; --- 1187,1202 newa-multidims = false; newa-location = -1; ! cmp = (Node *) make_scalar_array_op(pstate, ! a-name, ! useOr, ! lexpr, ! (Node *) newa, ! a-location); ! ! /* cmp certainly yields boolean, no need to check it */ ! ! cmpexprs = lappend(cmpexprs, cmp); /* Consider only the Vars (if any) in the loop below */ rexprs = rvars; *** transformAExprIn(ParseState *pstate, A_E *** 1198,1204 } /* ! * Must do it the hard way, ie, with a boolean expression tree. */ foreach(l, rexprs) { --- 1204,1210 } /* ! * Any remaining righthand exprs need to be compared individually. */ foreach(l, rexprs) { *** transformAExprIn(ParseState *pstate, A_E *** 1226,1239 } cmp = coerce_to_boolean(pstate, cmp, IN); ! if (result == NULL) ! result = cmp; ! else ! result = (Node *) makeBoolExpr(useOr ? OR_EXPR : AND_EXPR, ! list_make2(result, cmp), ! a-location); } return result; } --- 1232,1253 } cmp = coerce_to_boolean(pstate, cmp, IN); ! ! cmpexprs = lappend(cmpexprs, cmp); } + /* + * If we have more than one comparison expression, AND or OR them together + */ + if (cmpexprs == NIL) + result = NULL; /* can this happen? is it right if so? */ + else if (list_length(cmpexprs) == 1) + result = (Node *) linitial(cmpexprs); + else
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Bruce Momjian br...@momjian.us writes: On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote: As I mentioned, I would just cut those introductory parts out. Should we link to the Debian wiki content? -1. We generally don't link to our *own* wiki in our SGML docs, let alone things that aren't even under our project's control. Moreover, Debian is not going to be explaining these things in a way that accounts for non-Linux operating systems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On 2014-02-25 13:21:46 -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote: As I mentioned, I would just cut those introductory parts out. Should we link to the Debian wiki content? -1. We generally don't link to our *own* wiki in our SGML docs, let alone things that aren't even under our project's control. Agreed. Especially as the interesting bit is the postgres specific logic, not the rest. I think all that's needed is to cut the first paragraphs that generally explain what huge pages are in some detail from the text and make sure the later paragraphs don't refer to the earlier ones. Moreover, Debian is not going to be explaining these things in a way that accounts for non-Linux operating systems. It's a linux only feature so far, so that alone wouldn't be a problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 12:38 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 08:13 AM, Robert Haas wrote: I think you've got your head stuck deeply in the sand. The json data type works exactly like the xml data type has always worked. There have been occasional noises about making an xmlb data type, but nobody's minded enough to do anything about it, or at least not in this forum. So if the json data type has no future and is crap, then the same presumably holds of the xml data type. But I don't think anyone here believes that, unless they just hate xml on general principle, which I can certainly understand. Well, if we had an XMLB, I would in fact be making the same argument. I'll point out the only reason we're keeping the original json instead of forcing an upgrade to jsonb, per earlier discussions, is backwards-compatibility. If we had never had a json-text, and Merlin was proposing adding one now alongside jsonb, I'd be arguing against doing so. You can argue that all you like. But the same argument was made and rejected at the time we (I) added the original json type. So I don't believe that you can claim that your argument is backed by any sort of consensus, because AFAICS it isn't. In short, I think you're viewing everything about jsonb with rose-colored glasses on, and that your enthusiasm is mostly wishful thinking. Will there be good things about jsonb? Of course. Will lots of people want to use it for those reasons? Very likely. Will it be better than json in all ways and for all purposes? No, and implying the contrary is just plain wrong. It hurts our adoption substantially to confuse developers. We need to recommend one type over the other, hence Use jsonb unless you need X. Merlin is pushing the type of multivariable comparison where *I* wouldn't be able to make sense of which one I should pick, let alone some web developer who's just trying to get a site built. That sort of thing *really* doesn't help our users. I don't have any objection to editing what Merlin wrote to be clear and concise; I don't think he meant for it to be considered for inclusion in the documentation in exactly that form anyway. I do have an objection to including your unjustified partisanship in our documentation as fact. The reality is that if you have a bunch of JSON documents indexed by some ID number and expect to usually retrieve the whole document, you probably don't want jsonb. You probably want one integer column and one json column, because it's gonna be faster that way. And if you expect to usually retrieve only part of the document, then you are probably better off using separate columns for the separate parts of the document, because I bet that extracting a portion of a large document is still going to require de-TOASTing the whole thing, or at least all the data preceding the last byte offset of interest, which is full of lose. The situation where jsonb is going to win is where either (1) you or your client are so stuck in the document database model that you can't fathom the idea that using a real database schema might improve performance or (2) you have so many different things (pseudocolumns, as it were) that you might want to extract from any given JSON blob that it's impractical to use real columns for all of those. I agree those are both real use cases. I do not agree that they are the only or most common use cases. And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor comment improvements in tablecmds.c
On Mon, Feb 24, 2014 at 9:40 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: This is a small patch to improve comments in tablecmds.c. Please find attached a patch. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor comment improvements in tablecmds.c
Etsuro Fujita wrote: This is a small patch to improve comments in tablecmds.c. Please find attached a patch. I find both patched and unpatched to be pretty illegible. How about something like /* - * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE RENAME + * Execute ALTER relation type RENAME + * This routine supports tables, indexes, sequences, views, + * and foreign tables */ and the two other ones with something like this: /* - * Grab an exclusive lock on the target table, index, sequence or view, - * which we will NOT release until end of transaction. + * Grab an exclusive lock on the target relation, + * which we will NOT release until + * end of transaction. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 10:31 AM, Robert Haas wrote: And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. Where did you get this out of my doc patch? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.7
On Mon, Feb 24, 2014 at 6:16 PM, Andres Freund and...@2ndquadrant.com wrote: I actually thought they'd be too ugly to live and we'd remove them pre-commit. Might be getting to be about that time, then. - if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit) + if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit || + XLogLogicalInfoActive()) Mmph. Is this really necessary? If so, why? The comments could elucidate. We could get rid of it by (optionally) adding information about the database oid to compact commit, but that'd increase the size of the record. So why do we need the database OID? + bool fail_softly = slot-data.persistency == RS_EPHEMERAL; This should be contingent on whether we're being called in the error pathway, not the slot type. I think you should pass a bool. Why? I had it that way at first, but for persistent slots this won't be called in error pathways as we won't drop there. I was thinking more the reverse - that a non-persistent slot might be dropped in a non-error pathway. It seems to be indicating, roughly, whether the relation should participate in RecentGlobalXmin or RecentGlobalDataXmin. But is there any point at all of separating those when !XLogLogicalInfoActive()? The test expands to: IsSystemRelation() || (XLogLogicalInfoActive() RelationNeedsWAL(relation) (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))) So basically this is all tables created in pg_catalog during initdb plus all TOAST tables in the system. If wal_level=logical, then we also include tables marked with the reloption user_catalog_table=true, unless they're unlogged. This all seems a bit complex. Why not this: IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation) Because that'd possibly retain too much when !XLogLogicalInfoActive(), there's no need to look for RelationIsUsedAsCatalogTable() for those. We could decide not to care? But when !XLogLogicalInfoActive() I think we could just make this always false, right? I mean, if PROC_IN_LOGICAL_DECODING is never going to be set, the values are always going to be the same anyway. I think. + /* +* This is a bit tricky: We need to determine a safe xmin horizon to start +* decoding from, to avoid starting from a running xacts record referring +* to xids whose rows have been vacuumed or pruned +* already. GetOldestSafeDecodingTransactionId() returns such a value, but +* without further interlock it's return value might immediately be out of +* date. +* +* So we have to acquire the ProcArrayLock to prevent computation of new +* xmin horizons by other backends, get the safe decoding xid, and inform +* the slot machinery about the new limit. Once that's done the +* ProcArrayLock can be be released as the slot machinery now is +* protecting against vacuum. +* +*/ I can't claim to be very excited about this. Because of the already_locked parameters, or any wider concerns? Passing down already_locked through several layers is kind of ugly, but also, holding ProcArrayLock more is sad. That is not a lightly-contended lock. I'm assuming you've spent a lot of time thinking about ways to avoid this and utterly failed to come up with any reasonable alternative, but let me take a shot. Suppose we take ProcArrayLock in exclusive mode and compute the oldest running XID, install it as our xmin, and then release ProcArrayLock. At that point, nobody else can compute an oldest-xmin value that precedes that value, so we can take our time installing that value as the slot's xmin, without needing to hold a lock meanwhile. I actually had it that way for a while, but what if the backend already has a xmin set? Then we need to reason about whether the xmin is newer, restore it afterwards and such. That doesn't seem nice. It's not too far removed from the problem snapmgr.c is already designed to solve, though, is it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 10:31 AM, Robert Haas wrote: And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. Where did you get this out of my doc patch? Way to quote what I said out of context. But to make a long story short, I get that from the fact that you want to railroad everyone into using jsonb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor comment improvements in tablecmds.c
On Tue, Feb 25, 2014 at 1:44 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Etsuro Fujita wrote: This is a small patch to improve comments in tablecmds.c. Please find attached a patch. I find both patched and unpatched to be pretty illegible. How about something like /* - * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE RENAME + * Execute ALTER relation type RENAME + * This routine supports tables, indexes, sequences, views, + * and foreign tables */ I don't find that to be an improvement. and the two other ones with something like this: /* - * Grab an exclusive lock on the target table, index, sequence or view, - * which we will NOT release until end of transaction. + * Grab an exclusive lock on the target relation, + * which we will NOT release until + * end of transaction. But I do like that better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 10:50 AM, Robert Haas wrote: On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 10:31 AM, Robert Haas wrote: And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. Where did you get this out of my doc patch? Way to quote what I said out of context. Way to put words in my mouth. But to make a long story short, I get that from the fact that you want to railroad everyone into using jsonb. That's called a straw man argument, Robert. Me: We should recommend that people use jsonb unless they have a specific reason for using json. Merlin: We should present them side-by-side with a complex comparison. Robert: Josh wants to junk all relational data and use only jsonb! I mean, really, WTF? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 09:45 AM, Bruce Momjian wrote: It hurts our adoption substantially to confuse developers. We need to recommend one type over the other, hence Use jsonb unless you need X. Merlin is pushing the type of multivariable comparison where *I* wouldn't be able to make sense of which one I should pick, let alone some web developer who's just trying to get a site built. That sort of thing *really* doesn't help our users. I agree it would be nice to have something simple, like Use JSON if you wish to just store/retrieve entire JSON structures, and JSONB if you wish to do any kind of lookup or manipulation of JSON values on the server. (to clarify below: json refers to the current varlena datatype; JSON refers to JSON serialized data). I don't think that's decisive enough, which is why I wrote the doc the way I did. The problem is that most users would prefer that we tell them which one to use, which is why I want to structure the doc as Use jsonb unless you need one of these things, or more specifically: In general, most applications will find it advantageous to store JSON data as typejsonb/type, as jsonb is more efficient when using JSON manipulation functions, and will support future advanced json index, operator and search features. The typejson/type will primarily be useful for applications which need to preserve exact formatting of the input JSON, or users with existing typejson/type columns which they do not want to convert to typejsonb/type. Part of my reason for wanting to recommend jsonb over json is in the context of the third storage option for JSON, namely TEXT. The only things which distinguish json from TEXT for JSON storage are validation and a set of json manipulation functions. jsonb works with the manipulation functions better/faster, causing the old json type to start looking like more of a DOMAIN over TEXT than a real type comparatively. In other words, if you ask the question Why would I want to use json instead of either jsonb or TEXT, the answer becomes quite narrow. Possibly I should expand the little chart and add a column for TEXT? It's a viable option for storing JSON data, especially if you store a lot of broken JSON or fragments. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 10:54 AM, Josh Berkus wrote: On 02/25/2014 10:50 AM, Robert Haas wrote: On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 10:31 AM, Robert Haas wrote: And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. Where did you get this out of my doc patch? Way to quote what I said out of context. Way to put words in my mouth. But to make a long story short, I get that from the fact that you want to railroad everyone into using jsonb. That's called a straw man argument, Robert. Me: We should recommend that people use jsonb unless they have a specific reason for using json. Merlin: We should present them side-by-side with a complex comparison. From the cheap seats. To me the whole hstore/json/jsonb family is a WIP and any enlightenment in the form of comparisons would be greatly appreciated by me and other end users I would suspect. Robert: Josh wants to junk all relational data and use only jsonb! I mean, really, WTF? Seems to be a hot topic all the way around. I am neck deep in learning Web development and am coming to grips with the role of JSON in that world. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.7
Hi, On 2014-02-25 13:47:49 -0500, Robert Haas wrote: On Mon, Feb 24, 2014 at 6:16 PM, Andres Freund and...@2ndquadrant.com wrote: I actually thought they'd be too ugly to live and we'd remove them pre-commit. Might be getting to be about that time, then. I want to leave them in until the slot semantics aren't going to change anymore, they are pretty useful for testing that. But I'll separate them out into a separate commit again. - if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit) + if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit || + XLogLogicalInfoActive()) Mmph. Is this really necessary? If so, why? The comments could elucidate. We could get rid of it by (optionally) adding information about the database oid to compact commit, but that'd increase the size of the record. So why do we need the database OID? To ignore commits from other databases. Since we don't decode changes from other databases, it's really confusing (and pointless overhead) to see transactions from there. + bool fail_softly = slot-data.persistency == RS_EPHEMERAL; This should be contingent on whether we're being called in the error pathway, not the slot type. I think you should pass a bool. Why? I had it that way at first, but for persistent slots this won't be called in error pathways as we won't drop there. I was thinking more the reverse - that a non-persistent slot might be dropped in a non-error pathway. Well, currently EPHEMERAL slots are documented to be dropped at release since that's what changeset extraction (and possibly basebackup and receivexlog) need afaics. You'd prefer DROP_ON_ERROR semantics? It seems to be indicating, roughly, whether the relation should participate in RecentGlobalXmin or RecentGlobalDataXmin. But is there any point at all of separating those when !XLogLogicalInfoActive()? The test expands to: IsSystemRelation() || (XLogLogicalInfoActive() RelationNeedsWAL(relation) (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))) So basically this is all tables created in pg_catalog during initdb plus all TOAST tables in the system. If wal_level=logical, then we also include tables marked with the reloption user_catalog_table=true, unless they're unlogged. This all seems a bit complex. Why not this: IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation) Because that'd possibly retain too much when !XLogLogicalInfoActive(), there's no need to look for RelationIsUsedAsCatalogTable() for those. We could decide not to care? But when !XLogLogicalInfoActive() I think we could just make this always false, right? I mean, if PROC_IN_LOGICAL_DECODING is never going to be set, the values are always going to be the same anyway. I think. It seems confusing and bug-prone to use the wrong horizon variable just because right now they'd be the same if wal_level logical. I can't claim to be very excited about this. Because of the already_locked parameters, or any wider concerns? Passing down already_locked through several layers is kind of ugly, but also, holding ProcArrayLock more is sad. That is not a lightly-contended lock. Absolutely true, but this is very far from a operation that will be frequent enough to matter. Creating a slot so frequently that a lock on the procarray hold while iterating the slot array matters, will be painful long before the contention on that is the problem. I'm assuming you've spent a lot of time thinking about ways to avoid this and utterly failed to come up with any reasonable alternative, but let me take a shot. Suppose we take ProcArrayLock in exclusive mode and compute the oldest running XID, install it as our xmin, and then release ProcArrayLock. At that point, nobody else can compute an oldest-xmin value that precedes that value, so we can take our time installing that value as the slot's xmin, without needing to hold a lock meanwhile. I actually had it that way for a while, but what if the backend already has a xmin set? Then we need to reason about whether the xmin is newer, restore it afterwards and such. That doesn't seem nice. It's not too far removed from the problem snapmgr.c is already designed to solve, though, is it? Hm, I don't immediately see how it would fit in there. PgXact-xmin is set by procarray.c, all snapmgr does is reset it. And there's no logic about resetting it back to higher values and such. I'll ponder on getting rid of this, but I am not of too high hopes. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unfortunate choice of short switch name in pgbench
I just wasted some time puzzling over strange results from pgbench. I eventually realized that I'd been testing against the wrong server, because rather than -p 65432 I'd typed -P 65432, thereby invoking the recently added --progress option. pgbench has no way to know that that isn't what I meant; the fact that both switches take integer arguments doesn't help. To fix this, I propose removing the -P short form and only allowing the long --progress form. I won't argue that this feature is completely useless, but for sure it's not something I'd want more often than once in a blue moon. So I think it does not need to have a short form; and for sure it doesn't need a short form that's so easily confused with a commonly used switch. If no objections, I'll go make that change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further? This seems like an important clue that could be useful when trying to optimize this code, but I'm a little unclear which part of the operation has more cache misses with your changes and why. In the patched version, for the heapify operation the outer loop starts at the last heap-parent tuple and works backward to the start of the tuples array. A copy is taken of the SortTuple being operated on for the inner loop to use. This copy suffers cache misses. (The inner loop operates on elements between the copy source and the end of the array). In the original, the outer loop runs the array in increasing index order. Again a copy is taken of the SortTuple for the inner loop to use. This copy does not appear to take significant cache misses. (The inner loop operates on elements between the copy source and the start of the array). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unfortunate choice of short switch name in pgbench
2014-02-25 20:49 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: I just wasted some time puzzling over strange results from pgbench. I eventually realized that I'd been testing against the wrong server, because rather than -p 65432 I'd typed -P 65432, thereby invoking the recently added --progress option. pgbench has no way to know that that isn't what I meant; the fact that both switches take integer arguments doesn't help. To fix this, I propose removing the -P short form and only allowing the long --progress form. I won't argue that this feature is completely useless, but for sure it's not something I'd want more often than once in a blue moon. So I think it does not need to have a short form; and for sure it doesn't need a short form that's so easily confused with a commonly used switch. If no objections, I'll go make that change. +1 Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unfortunate choice of short switch name in pgbench
On Tue, Feb 25, 2014 at 2:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: I just wasted some time puzzling over strange results from pgbench. I eventually realized that I'd been testing against the wrong server, because rather than -p 65432 I'd typed -P 65432, thereby invoking the recently added --progress option. pgbench has no way to know that that isn't what I meant; the fact that both switches take integer arguments doesn't help. To fix this, I propose removing the -P short form and only allowing the long --progress form. I won't argue that this feature is completely useless, but for sure it's not something I'd want more often than once in a blue moon. So I think it does not need to have a short form; and for sure it doesn't need a short form that's so easily confused with a commonly used switch. If no objections, I'll go make that change. Hmm. I don't have a real specific opinion on the value of this particular --progress option, but my experience is that most --progress options get a lot of use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 1:54 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 10:50 AM, Robert Haas wrote: On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 10:31 AM, Robert Haas wrote: And I definitely don't agree that our documentation should push people towards stuffing everything in a JSON blob instead of using real column definitions. Where did you get this out of my doc patch? Way to quote what I said out of context. Way to put words in my mouth. But to make a long story short, I get that from the fact that you want to railroad everyone into using jsonb. That's called a straw man argument, Robert. Me: We should recommend that people use jsonb unless they have a specific reason for using json. Merlin: We should present them side-by-side with a complex comparison. Robert: Josh wants to junk all relational data and use only jsonb! I mean, really, WTF? OK, since what I said seems to have become distorted somewhere along the line, allow me to rephrase: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] define type_transform to new user defined type
I want to create new type that is similar to varchar. its size is variable. I use CREATE TYPE query. I define for that type this functions: 1-typein 2-typeoute 3-typemodify_input 4-typemodify_output 5-type_transform I can define 1 to 4 functions in CREATE TYPE but I can't define type_transform for that type. how did I can define type_transform for that type?
Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype
On Mon, Feb 24, 2014 at 9:34 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Yeah, erroring out seems good enough. Particularly if you add a hint saying please upgrade the extension. In past instances where this has come up, we have actually made the .so backward-compatible. See pg_stat_statements in particular. I'd prefer to keep to that precedent here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype
Robert Haas escribió: On Mon, Feb 24, 2014 at 9:34 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Yeah, erroring out seems good enough. Particularly if you add a hint saying please upgrade the extension. In past instances where this has come up, we have actually made the .so backward-compatible. See pg_stat_statements in particular. I'd prefer to keep to that precedent here. But pg_stat_statement is a user tool which is expected to have lots of use, and backwards compatibility concerns because of people who might have written tools on top of it. Not so with pageinspect. I don't think we need to put in the same amount of effort. (Even though, really, it's probably not all that difficult to support both cases. I just don't see the point.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak
On Tue, Feb 25, 2014 at 6:56 AM, Sergey Burladyan eshkin...@gmail.com wrote: It looks like I found the problem, Perl use reference count and something that is called Mortal for memory management. As I understand it, mortal is free after FREETMPS. Plperl call FREETMPS in plperl_call_perl_func() but after it, plperl ask perl interpreter again for new mortal SV variables, for example, in hek2cstr from plperl_sv_to_datum, and this new SV is newer freed. So I think hek2cstr is the only place we leak (its the only place I can see that allocates a mortal sv without being wrapped in ENTER/SAVETMPS/FREETMPS/LEAVE). Does the attached fix it for you? diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 930b9f0..3bc4034 100644 --- a/src/pl/plperl/plperl.c +++ b/src/pl/plperl/plperl.c @@ -304,6 +304,16 @@ static char *setlocale_perl(int category, char *locale); static char * hek2cstr(HE *he) { + char *ret; + SV *sv; + + /* + * HeSVKEY_force will return a temporary mortal SV*, so we need to make + * sure to free it with ENTER/SAVE/FREE/LEAVE + */ + ENTER; + SAVETMPS; + /*- * Unfortunately, while HeUTF8 is true for most things 256, for values * 128..255 it's not, but perl will treat them as unicode code points if @@ -328,11 +338,17 @@ hek2cstr(HE *he) * right thing *- */ - SV *sv = HeSVKEY_force(he); + sv = HeSVKEY_force(he); if (HeUTF8(he)) SvUTF8_on(sv); - return sv2cstr(sv); + ret = sv2cstr(sv); + + /* free sv */ + FREETMPS; + LEAVE; + + return ret; } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster
Tom Lane t...@sss.pgh.pa.us wrote: Or we could wait to see if anybody reports this sort of behavior in a shell that won't be out of support before 9.4 gets out the door. We have a field report of this happening in the sh shell in Solaris 10. Our staff has confirmed this. In Solaris 10 they can start multiple clusters from a single shell, and if they later use Ctrl+C in that shell all of those instances are shut down. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin
On 2/1/14, 3:22 PM, Andrew Dunstan wrote: In the end I went with the way I had suggested, because that's what the MSVC system does - it doesn't copy any other DLLs to the bin directory. So doing that seemed sane for backpatching, to bring the two build systems into sync. If you want to propose a better arrangement for the future, to include, say, ecpg DLLs, and including changes to the MSVC system, we can discuss that separately. See attached patch. There is also the commit fest item https://commitfest.postgresql.org/action/patch_view?id=1330 that requests the MSVC builds to install the epcg libraries in the bin directory. diff --git a/src/Makefile.shlib b/src/Makefile.shlib index 029c7e9..c04e2ef 100644 --- a/src/Makefile.shlib +++ b/src/Makefile.shlib @@ -484,6 +484,9 @@ endif endif # not win32 endif # not cygwin endif # not aix +ifneq (,$(findstring $(PORTNAME),win32 cygwin)) + $(INSTALL_SHLIB) $ '$(DESTDIR)$(bindir)/$(shlib)' +endif else # no soname $(INSTALL_SHLIB) $ '$(DESTDIR)$(pkglibdir)/$(shlib)' endif @@ -491,7 +494,7 @@ endif installdirs-lib: ifdef soname - $(MKDIR_P) '$(DESTDIR)$(libdir)' '$(DESTDIR)$(pkgconfigdir)' + $(MKDIR_P) '$(DESTDIR)$(libdir)' '$(DESTDIR)$(pkgconfigdir)' $(if $(findstring $(PORTNAME),win32 cygwin),'$(DESTDIR)$(bindir)') else $(MKDIR_P) '$(DESTDIR)$(pkglibdir)' endif @@ -507,7 +510,7 @@ ifdef soname rm -f '$(DESTDIR)$(libdir)/$(stlib)' rm -f '$(DESTDIR)$(libdir)/$(shlib_bare)' \ '$(DESTDIR)$(libdir)/$(shlib_major)' \ - '$(DESTDIR)$(libdir)/$(shlib)' \ + '$(DESTDIR)$(libdir)/$(shlib)' $(if $(findstring $(PORTNAME),win32 cygwin),'$(DESTDIR)$(bindir)/$(shlib)') \ '$(DESTDIR)$(pkgconfigdir)/lib$(NAME).pc' else # no soname rm -f '$(DESTDIR)$(pkglibdir)/$(shlib)' diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 7f2d901..2d11816 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -121,18 +121,12 @@ install: all installdirs install-lib $(INSTALL_DATA) $(srcdir)/libpq-int.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/pqexpbuffer.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/pg_service.conf.sample '$(DESTDIR)$(datadir)/pg_service.conf.sample' -ifneq (,$(findstring $(PORTNAME), win32 cygwin)) - $(INSTALL_DATA) $(shlib) '$(DESTDIR)$(bindir)/$(shlib)' -endif installcheck: $(MAKE) -C test $@ installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)' -ifneq (,$(findstring $(PORTNAME), win32 cygwin)) - $(MKDIR_P) '$(DESTDIR)$(bindir)' -endif uninstall: uninstall-lib rm -f '$(DESTDIR)$(includedir)/libpq-fe.h' @@ -140,9 +134,6 @@ uninstall: uninstall-lib rm -f '$(DESTDIR)$(includedir_internal)/libpq-int.h' rm -f '$(DESTDIR)$(includedir_internal)/pqexpbuffer.h' rm -f '$(DESTDIR)$(datadir)/pg_service.conf.sample' -ifneq (,$(findstring $(PORTNAME), win32 cygwin)) - rm -f '$(DESTDIR)$(bindir)/$(shlib)' -endif clean distclean: clean-lib $(MAKE) -C test $@ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] define type_transform to new user defined type
Mohsen SM mohsensoodk...@gmail.com writes: I want to create new type that is similar to varchar. its size is variable. I use CREATE TYPE query. I define for that type this functions: 1-typein 2-typeoute 3-typemodify_input 4-typemodify_output 5-type_transform I can define 1 to 4 functions in CREATE TYPE but I can't define type_transform for that type. how did I can define type_transform for that type? There's no such thing as a type transform. There are transforms associated with functions ... unfortunately, there's not currently any provision for defining those at the SQL level. You could poke an entry into pg_proc.protransform if you're desperate enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire
On Mon, Feb 24, 2014 at 10:13 AM, Christian Kruse christ...@2ndquadrant.com wrote: To be honest, I don't like the idea of setting up this error context only for log_lock_wait messages. This sounds unnecessary complex to me and I think that in the few cases where this message doesn't add a value (and thus is useless) don't justify such complexity. Reading this over, I'm not sure I understand why this is a CONTEXT at all and not just a DETAIL for the particular error message that it's supposed to be decorating. Generally CONTEXT should be used for information that will be relevant to all errors in a given code path, and DETAIL for extra information specific to a particular error. If we're going to stick with CONTEXT, we could rephrase it like this: CONTEXT: while attempting to lock tuple (1,2) in relation with OID 3456 or when the relation name is known: CONTEXT: while attempting to lock tuple (1,2) in relation public.foo Displaying whole tuple doesn't seem to be the most right way to get debug information and especially in this case we are already displaying tuple offset(ctid) which is unique identity to identify a tuple. It seems to me that it is sufficient to display unique value of tuple if present. I understand that there is no clear issue here, so may be if others also share their opinion then it will be quite easy to take a call. I wouldn't be inclined to dump the whole tuple under any circumstances. That could be a lot more data than what you want dumped in your log. The PK could already be somewhat unreasonably large, but the whole tuple could be a lot more unreasonably large. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
You should be able to do this without specifically referencing the names libpq or ecpg. Look into the Makefile, if it defines SO_MAJOR_VERSION, then it's a library you are concerned with, and then do the copying or moving. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Josh Berkus escribió: (to clarify below: json refers to the current varlena datatype; JSON refers to JSON serialized data). FWIW the term varlena json is misleading. jsonb is also varlena, only different. I think you need a different term to say that json uses the text representation. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unfortunate choice of short switch name in pgbench
Robert Haas robertmh...@gmail.com writes: On Tue, Feb 25, 2014 at 2:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: To fix this, I propose removing the -P short form and only allowing the long --progress form. I won't argue that this feature is completely useless, but for sure it's not something I'd want more often than once in a blue moon. So I think it does not need to have a short form; and for sure it doesn't need a short form that's so easily confused with a commonly used switch. Hmm. I don't have a real specific opinion on the value of this particular --progress option, but my experience is that most --progress options get a lot of use. Meh. A progress-reporting feature has use when the tool is working towards completion of a clearly defined task. In the case of pgbench, if you told it to run for -T 60 seconds rather than -T 10 seconds, that's probably because you don't trust a 10-second average to be sufficiently reproducible. So I'm not real sure that reporting averages over shorter intervals is all that useful; especially not if it takes cycles out of pgbench, which itself is often a bottleneck. I could see some value in a feature that computed shorter-interval TPS averages and then did some further arithmetic, like measuring the standard deviation of the shorter-interval averages to assess how much noise there will be in the full-run average. But that's not what this does, and if it did do that, reporting progress would not be what I'd see as its main purpose. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: May I mark this patch as Ready for Committer by myself since this was already marked so in last CF3 and have had no objection or new suggestion in this CF4 for more than a month? Sounds fair. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 12:12 PM, Robert Haas wrote: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Awaiting doc patch from Merlin, then. It will need to be clear enough that an ordinary user can distinguish which type they want. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
I'm massively in favor of this feature. (I had started writing it about three times myself.) The problem I see, however, is that most extensions, by recommendation, set module_pathname = '$libdir/pgfoo', and so relocating the control files will still end up pointing to a not relocated library file. We would need to remove that and then ask users to keep their dynamic_library_path in sync with extension_control_path. That's error prone, of course. In order to address this properly, we need a new directory structure that keeps library files and control files together, similar to how Python, Ruby, etc. install things, and then just one path for everything. Now a few technical problems. When an extension is not found, I get the error message ERROR: could not open extension control file (null): Bad address Something is broken there. In the documentation, order extension_control_path after dynamic_library_path. Also, the documentation states that this controls the location of the control file, but it of course controls the location of the script files also. That should be made clearer. (It becomes clearer if we just have one path for everything. ;-) ) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 12:12 PM, Robert Haas wrote: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Awaiting doc patch from Merlin, then. It will need to be clear enough that an ordinary user can distinguish which type they want. Sure. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dumpall reccomendation in release notes
Bruce, Tom: Can we change this text in the template release notes? A dump/restore using pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html, or use of pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is required for those wishing to migrate data from any previous release. Again, here we're recommending pg_dumpall with its many limitations, and not mentioning pg_dump/pg_restore at all. This has caused several people to ask me if pg_dump is not supported for upgrading anymore. Fix? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Or we could wait to see if anybody reports this sort of behavior in a shell that won't be out of support before 9.4 gets out the door. We have a field report of this happening in the sh shell in Solaris 10. Our staff has confirmed this. In Solaris 10 they can start multiple clusters from a single shell, and if they later use Ctrl+C in that shell all of those instances are shut down. Do you want to try the set -m hack suggested upthread? I see no point in pursuing the portability questions unless we've verified that it fixes the problem for someone. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
Josh Berkus j...@agliodbs.com writes: Can we change this text in the template release notes? A dump/restore using pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html, or use of pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is required for those wishing to migrate data from any previous release. Again, here we're recommending pg_dumpall with its many limitations, and not mentioning pg_dump/pg_restore at all. This has caused several people to ask me if pg_dump is not supported for upgrading anymore. Fix? There's a very good reason for not recommending pg_dump in this context: it won't dump everything. Yeah, if you know what you're doing, you might use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc, but we're not going to try to wedge all that info into one or two sentences of release-note boilerplate. If you can get that right then you don't need the release notes to remind you anyway. If you aren't likely to get it right then the release notes would do you no service by suggesting it. I'm not sure what many limitations you think pg_dumpall has that pg_dump doesn't. I do think that it might be time to reword this to recommend pg_upgrade first, though. ISTM that the current wording dates from when pg_upgrade could charitably be described as experimental. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
On 02/25/2014 03:41 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Can we change this text in the template release notes? A dump/restore using pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html, or use of pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is required for those wishing to migrate data from any previous release. Again, here we're recommending pg_dumpall with its many limitations, and not mentioning pg_dump/pg_restore at all. This has caused several people to ask me if pg_dump is not supported for upgrading anymore. Fix? There's a very good reason for not recommending pg_dump in this context: it won't dump everything. Yeah, if you know what you're doing, you might use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc, but we're not going to try to wedge all that info into one or two sentences of release-note boilerplate. If you can get that right then you don't need the release notes to remind you anyway. If you aren't likely to get it right then the release notes would do you no service by suggesting it. Right. But the fact that we don't mention it *at all* has caused some users to ask me if regular pg_dump doesn't work for upgrades anymore. Which reminds me, I need to get the doc patch for the upgrade page in this week. It does make me wonder if we should direct users to the upgrade page though, instead of the individual command pages. I'm not sure what many limitations you think pg_dumpall has that pg_dump doesn't. Lack of parallel dump and restore is the biggest one. I do think that it might be time to reword this to recommend pg_upgrade first, though. ISTM that the current wording dates from when pg_upgrade could charitably be described as experimental. Yah. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
Josh Berkus j...@agliodbs.com writes: On 02/25/2014 03:41 PM, Tom Lane wrote: There's a very good reason for not recommending pg_dump in this context: it won't dump everything. Yeah, if you know what you're doing, you might use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc, but we're not going to try to wedge all that info into one or two sentences of release-note boilerplate. If you can get that right then you don't need the release notes to remind you anyway. If you aren't likely to get it right then the release notes would do you no service by suggesting it. Right. But the fact that we don't mention it *at all* has caused some users to ask me if regular pg_dump doesn't work for upgrades anymore. TBH, anybody who's asking that kind of question probably falls in the category of wouldn't get it right. I've heard enough bitching from novices who thought that pg_dump would be enough to get everything out of their now-gone database that I have no desire to encourage use of bare pg_dump here. (Whether we shouldn't redesign the functionality of these programs is a different discussion. The release notes have to reflect what is, though, not what might ideally be.) It does make me wonder if we should direct users to the upgrade page though, instead of the individual command pages. If we had a page discussing the pros and cons of different upgrade methods, yeah, I'd be in favor of reducing the release-note text to a pointer to that page. I don't see such a page in a quick skim of the fine manual's table of contents though? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
On 02/25/2014 03:59 PM, Tom Lane wrote: If we had a page discussing the pros and cons of different upgrade methods, yeah, I'd be in favor of reducing the release-note text to a pointer to that page. I don't see such a page in a quick skim of the fine manual's table of contents though? I owe an update of http://www.postgresql.org/docs/9.3/static/upgrading.html; I think I can easily include a discussion of the various options there. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bit data type header reduction in some cases
On Tue, Feb 25, 2014 at 09:32:55AM -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 02/25/2014 08:23 AM, Haribabu Kommi wrote: It's regarding a Todo item of Bit data type header reduction in some cases. The header contains two parts. 1) The varlena header is automatically converted to 1 byte header from 4 bytes in case of small data. 2) The bit length header called bit_len to store the actual bit length which is of 4 bytes in size. I want to reduce this bit_len size to 1 byte in some cases as similar to varlena header. With this change the size of the column reduced by 3 bytes, thus shows very good decrease in disk usage. [ various contorted schemes for doing that backward-compatibly ] TBH, this sounds like a huge amount of effort and a significant risk of new bugs in return for not darn much. Who uses bit values anyway? They were removed from the SQL spec more than 10 years ago. And of that population, who cares about a byte or two per value? The field demand for this is not only zero, it's probably negative. (The thread referenced by the TODO entry shows no evidence of user demand.) OK, TODO item removed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] In which good intentions are punished, take 2
Since I committed 49c817eab78c6f0ce8c3bf46766b73d6cf3190b7 to make pg_do_encoding_conversion not fail silently, buildfarm member magpie has been whining: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpiedt=2014-02-25%2011%3A00%3A08 To wit, that if you run contrib/citext's regression tests in a LATIN2-encoded database, you get this: SELECT convert_to( name, 'ISO-8859-1' ) = convert_to( name::text, 'ISO-8859-1' ) AS t FROM srt; ! ERROR: default conversion function for encoding LATIN2 to LATIN1 does not exist Well, it's right, there is no such conversion function. A quick look in the buildfarm logs for older runs confirms that previously, all we were getting was the useless LOG bleats that the previous coding emitted: LOG: default conversion function for encoding LATIN2 to LATIN1 does not exist STATEMENT: SELECT convert_to( name, 'ISO-8859-1' ) = convert_to( name::text, 'ISO-8859-1' ) AS t FROM srt; That didn't appear in the user-visible output, allowing the regression test to appear to pass although under no circumstances could it be argued that what was happening was sane. We could possibly replace this test case with SELECT convert_from( name::bytea, 'SQL_ASCII' ) = convert_from( name::text::bytea, 'SQL_ASCII' ) AS t FROM srt; though I'm not sure I see the point: whatever you might think this statement is testing, it's not got much to do with citext. On the other hand, we evidently have got precious little other buildfarm coverage of the convert() family of functions, so maybe removing this test altogether wouldn't be the best thing either. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dumpall reccomendation in release notes
On Tue, Feb 25, 2014 at 06:41:26PM -0500, Tom Lane wrote: I'm not sure what many limitations you think pg_dumpall has that pg_dump doesn't. I do think that it might be time to reword this to recommend pg_upgrade first, though. ISTM that the current wording dates from when pg_upgrade could charitably be described as experimental. Wow, so pg_upgrade takes the lead! And from Tom too! :-) I agree with Tom that mentioning pg_dump/restore is going to lead to global object data loss, and throwing the users to a URL with no explaination isn't going to help either. What we could do is to restructure the existing text and add a link to the upgrade URL for more details. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
* Peter Eisentraut (pete...@gmx.net) wrote: I'm massively in favor of this feature. (I had started writing it about three times myself.) Agreed. The problem I see, however, is that most extensions, by recommendation, set module_pathname = '$libdir/pgfoo', and so relocating the control files will still end up pointing to a not relocated library file. I was wondering how that was dealt with- I simply have not had time to get to looking at this in more detail. I thought the answer I got from Dimitri was that $libdir would actually end up being resolved to any of the available directories due to his other patch...? Perhaps we just need to add in to that list the alternate directory for the control files? Or, what I had been thinking at one point, was making $libdir actually be where this control file lives instead. There's risk there though, I suppose, as today only one thing means $libdir. Another thought that I had was dealing with possible overlaps and clarifying things, perhaps such as having a mapping of 'name' to 'directory' which remaps $libdir for that 'name' and then extensions would be created using the 'name' space. eg: set module_paths = 'mymodulepath:/path/to/whatever;nextmodulepath:/other'; CREATE EXTENSION mymodulepath:my_extension; We would need to remove that and then ask users to keep their dynamic_library_path in sync with extension_control_path. That's error prone, of course. I'm starting to regret that dynamic_library_path exists, tbh. Still, I don't think it'd be too bad to automatically add to that path (or to what's searched) the module_paths. In order to address this properly, we need a new directory structure that keeps library files and control files together, similar to how Python, Ruby, etc. install things, and then just one path for everything. Right, that's more-or-less what I was thinking module_path would be. Now a few technical problems. Agree with all these. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dumpall reccomendation in release notes
On 02/25/2014 04:42 PM, Bruce Momjian wrote: On Tue, Feb 25, 2014 at 06:41:26PM -0500, Tom Lane wrote: I'm not sure what many limitations you think pg_dumpall has that pg_dump doesn't. I do think that it might be time to reword this to recommend pg_upgrade first, though. ISTM that the current wording dates from when pg_upgrade could charitably be described as experimental. Wow, so pg_upgrade takes the lead! And from Tom too! :-) I agree with Tom that mentioning pg_dump/restore is going to lead to global object data loss, and throwing the users to a URL with no explaination isn't going to help either. What we could do is to restructure the existing text and add a link to the upgrade URL for more details. What I was suggesting was something like: Users upgrading from earlier versions will need to go through the entire upgrade procedure, as described on our upgrade page: link The problem is that anything we say about how to upgrade in one short sentence is going to confuse some people. BTW, the reason I got that question about pg_dump was that 9.2's release notes say pg_dump and 9.3's say pg_dumpall, causing users to think there's been some kind of change. Of course, this means I need to fix the upgrade page, and I need to write backported versions of that fix for at least 9.1 and 9.2. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In which good intentions are punished, take 2
Tom Lane wrote: On the other hand, we evidently have got precious little other buildfarm coverage of the convert() family of functions, so maybe removing this test altogether wouldn't be the best thing either. We do have precious little testing on encodings and conversions, yes. The problem is how to test these things without having the tests fail when any particular encoding is not installed in the test system. Maybe we can use the Perl test rig for this too: Peter said that if a test requires something not installed, the test is skipped without causing a failure. It seems to me that we could take advantage that so that each member tests whatever involves only the encodings it has installed; while each individual member would skip a large percentage of tests, the buildfarm as a whole would be testing a sizable portion, if not all of it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hi Kaigai-san, 2014-02-25 13:28 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: The reason why I asked the question above is, I haven't been 100% certain about its usage. Indeed, semifactors is applied on a limited usage, but quite easy to reproduce by extension later (using clauselist_selectivity) if extension wants this factor. So, I agree with removing the semifactors here. Agreed. It would be nice to mention how to obtain semifactos for people who want to implement advanced join overriding. mergeclause_list and param_source_rels seem little easier to use, but anyway it should be documented how to use those parameters. The mergeclause_list might not be sufficient for extensions to determine whether its own mergejoin is applicable here. See the comment below; that is on the head of select_mergejoin_clauses. | * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if | * this is a right/full join and there are nonmergejoinable join clauses. | * The executor's mergejoin machinery cannot handle such cases, so we have | * to avoid generating a mergejoin plan. (Note that this flag does NOT | * consider whether there are actually any mergejoinable clauses. This is | * correct because in some cases we need to build a clauseless mergejoin. | * Simply returning NIL is therefore not enough to distinguish safe from | * unsafe cases.) | It says, mergejoin_clause == NIL is not a sufficient check to determine whether the mergejoin logic is applicable on the target join. So, either of them is probably an option for extension that tries to implement Perhaps you mean both of them? their own mergejoin logic; (1) putting both of mergejoin_allowed and mergeclause_list as arguments of the hook, or (2) re-definition of select_mergejoin_clauses() as extern function to reproduce the variables on demand. Which one is more preferable? I prefer (1), because exposing inside of planner might blocks changing those internal functions. If (at the moment) those information is enough for overriding merge join for CSP, let's provide as parameters. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In which good intentions are punished, take 2
On 02/25/2014 08:53 PM, Alvaro Herrera wrote: Tom Lane wrote: On the other hand, we evidently have got precious little other buildfarm coverage of the convert() family of functions, so maybe removing this test altogether wouldn't be the best thing either. We do have precious little testing on encodings and conversions, yes. The problem is how to test these things without having the tests fail when any particular encoding is not installed in the test system. Maybe we can use the Perl test rig for this too: Peter said that if a test requires something not installed, the test is skipped without causing a failure. It seems to me that we could take advantage that so that each member tests whatever involves only the encodings it has installed; while each individual member would skip a large percentage of tests, the buildfarm as a whole would be testing a sizable portion, if not all of it. It should be easy (at least on *nix) for the buildfarm client to check what encodings are installed on the machine and run tests accordingly. I haven't been following closely, but if someone provides me with a simple spec I'll try to code it up. We're about due for a release anyway. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris j...@wizmail.org wrote: On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further? This seems like an important clue that could be useful when trying to optimize this code, but I'm a little unclear which part of the operation has more cache misses with your changes and why. In the patched version, for the heapify operation the outer loop starts at the last heap-parent tuple and works backward to the start of the tuples array. A copy is taken of the SortTuple being operated on for the inner loop to use. This copy suffers cache misses. (The inner loop operates on elements between the copy source and the end of the array). In the original, the outer loop runs the array in increasing index order. Again a copy is taken of the SortTuple for the inner loop to use. This copy does not appear to take significant cache misses. (The inner loop operates on elements between the copy source and the start of the array). Do you have any theory as to why this happens in one case but not the other? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hi Kaigai-san, 2014-02-23 22:24 GMT+09:00 Kohei KaiGai kai...@kaigai.gr.jp: (1) Interface to add alternative paths in addition to built-in join paths I found that create_custom_path is not used at all in your patch. I revised postgresql_fdw.c to use it like this. ... /* Create join information which is stored as private information. */ memset(jinfo, 0, sizeof(PgRemoteJoinInfo)); jinfo.fdw_server_oid = o_server_oid; jinfo.fdw_user_oid = o_user_oid; jinfo.relids = joinrel-relids; jinfo.jointype = jointype; jinfo.outer_rel = o_relinfo; jinfo.inner_rel = i_relinfo; jinfo.remote_conds = j_remote_conds; jinfo.local_conds = j_local_conds; /* OK, make a CustomScan node to run remote join */ cpath = create_customscan_path(root, joinrel, 0, 0, 0, /* estimate later */ NIL, required_outer, postgres-fdw, 0, packPgRemoteJoinInfo(jinfo)); estimate_remote_join_cost(root, cpath, jinfo, sjinfo); add_path(joinrel, cpath-path); ... This seems to work fine. Is this right approach? If so,this portion would be a good example to replace local join with custom scan for authors of custom scan providers. One thing I worry is the case that you've intentionally avoided calling create_customscan_path. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
Hello, I tried minimal implementation to do that. At Tue, 25 Feb 2014 19:45:56 +0900, Etsuro Fujita wrote In addition to an issue pointed out recently by Horiguchi-san, I've found there is another issue we have to discuss. That is, we can't build any parameterized Append paths for an inheritance tree that contains at least one foreign table, in set_append_rel_pathlist(). This is because the patch doesn't take into consideration reparameterization for paths for a foreign table, which attempts to modify these paths to have greater parameterization so that each child path in the inheritance tree can have the exact same parameterization. To do so, I think we would need to add code for the foreign-table case to reparameterize_path(). And I think we should introduce a new FDW routine, say ReparameterizeForeignPath() because the processing would be performed best by the FDW itself. Comments are welcome! I think the problem is foreign childs in inheritance tables prevents all menber in the inheritance relation from using parameterized paths, correct? |=# explain select * from p join (select uname from c1 limit 1) s on s.uname = p.uname; | QUERY PLAN |--- | Hash Join (cost=0.04..244.10 rows=50 width=58) | Hash Cond: (p.uname = c1_1.uname) | - Append (cost=0.00..206.01 rows=10012 width=50) | - Seq Scan on p (cost=0.00..0.00 rows=1 width=168) | - Seq Scan on c1 (cost=0.00..204.01 rows=10001 width=50) | - Foreign Scan on c2 (cost=0.00..2.00 rows=10 width=168) | Foreign File: /etc/passwd | Foreign File Size: 1851 | - Hash (cost=0.03..0.03 rows=1 width=8) | - Limit (cost=0.00..0.02 rows=1 width=8) | - Seq Scan on c1 c1_1 (cost=0.00..204.01 rows=10001 width=8) | Planning time: 1.095 ms Hmm. I tried minimal implementation to do that. This omits cost recalculation but seems to work as expected. This seems enough if cost recalc is trivial here. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index b79af7a..18ced04 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2062,6 +2062,16 @@ reparameterize_path(PlannerInfo *root, Path *path, case T_SubqueryScan: return create_subqueryscan_path(root, rel, path-pathkeys, required_outer); + case T_ForeignScan: + { + ForeignPath *fpath = (ForeignPath*) path; + ForeignPath *newpath = makeNode(ForeignPath); + memcpy(newpath, fpath, sizeof(ForeignPath)); + newpath-path.param_info = + get_baserel_parampathinfo(root, rel, required_outer); + /* cost recalc omitted */ + return (Path *)newpath; + } default: break; } |=# explain select * from p join (select uname from c1 limit 1) s on s.uname = p.uname; | QUERY PLAN | | Nested Loop (cost=0.00..10.46 rows=50 width=58) | - Limit (cost=0.00..0.02 rows=1 width=8) | - Seq Scan on c1 c1_1 (cost=0.00..204.01 rows=10001 width=8) | - Append (cost=0.00..10.30 rows=12 width=158) | - Seq Scan on p (cost=0.00..0.00 rows=1 width=168) | Filter: (c1_1.uname = uname) | - Index Scan using i_c1 on c1 (cost=0.29..8.30 rows=1 width=50) | Index Cond: (uname = c1_1.uname) | - Foreign Scan on c2 (cost=0.00..2.00 rows=10 width=168) | Filter: (c1_1.uname = uname) | Foreign File: /etc/passwd | Foreign File Size: 1851 | Planning time: 2.044 ms regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi kommi.harib...@gmail.comwrote: Thanks for the information, I will apply other patches also and start testing. When try to run the pgbench test, by default the cache-scan plan is not chosen because of more cost. So I increased the cpu_index_tuple_cost to a maximum value or by turning off index_scan, so that the plan can chose the cache_scan as the least cost. The configuration parameters changed during the test are, shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB, checkpoint_segments - 255 checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or enable_indexscan=off Test procedure: 1. Initialize the database with pgbench with 75 scale factor. 2. Create the triggers on pgbench_accounts 3. Use a select query to load all the data into cache. 4. Run a simple update pgbench test. Plan details of pgbench simple update queries: postgres=# explain update pgbench_accounts set abalance = abalance where aid = 10; QUERY PLAN --- Update on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) - Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) Index Cond: (aid = 10) Planning time: 0.045 ms (4 rows) postgres=# explain select abalance from pgbench_accounts where aid = 10; QUERY PLAN Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99 rows=1 width=4) Filter: (aid = 10) Planning time: 0.042 ms (3 rows) I am observing a too much delay in performance results. The performance test script is attached in the mail. please let me know if you find any problem in the test. Regards, Hari Babu Fujitsu Australia run_reading.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Hello, At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: May I mark this patch as Ready for Committer by myself since this was already marked so in last CF3 and have had no objection or new suggestion in this CF4 for more than a month? Sounds fair. Thank you, I will send this patch to commtters after waiting another few days for more suggestions. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/26/2014 06:21 AM, Merlin Moncure wrote: On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 12:12 PM, Robert Haas wrote: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Awaiting doc patch from Merlin, then. It will need to be clear enough that an ordinary user can distinguish which type they want. Sure. Please also highlight that any change will require a full table rewrite with an exclusive lock, so data type choices on larger tables may be hard to change later. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. I'm not following this logic at all- how are you defining foreign from regular? Certainly, in-memory-only tables which are sitting out in some non-persistent GPU memory aren't regular by any PG definition. Perhaps you can't make ForeignScan suddenly work as a join-node replacement, but I've not seen where anyone has proposed that (directly- I've implied it on occation where a remote view can be used, but that's not the same thing as having proper push-down support for joins). I'd like to define these features are designed for individual purpose. My previous complaint about this patch set has been precisely that each piece seems to be custom-built and every patch needs more and more backend changes. If every time someone wants to do something with this CustomScan API, they need changes made to the backend code, then it's not a generally useful external API. We really don't want to define such an external API as then we have to deal with backwards compatibility, particularly when it's all specialized to specific use cases which are all different. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. There's certainly nothing in the FDW API which requires that the remote side have an internal heap representation, as evidenced by the various FDWs which already exist and certainly are not any kind of 'normal' heap. Every query against the foriegn relation goes through the FDW API and can end up returning whatever the FDW author decides is appropriate to return at that time, as long as it matches the tuple description- which is absolutely necessary for any kind of sanity, imv. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I can see the usefulness in being able to push down aggregates or other function-type calls to the remote side of an FDW and would love to see work done along those lines, along with the ability to push down joins to remote systems- but I'm not convinced that the claimed flexibility with the CustomScan API is there, given the need to continue modifying the backend code for each use-case, nor that there are particularly new and inventive ways of saying find me all the cases where set X overlaps with set Y. I'm certainly open to the idea that we could have an FDW API which allows us to ask exactly that question and let the remote side cost it out and give us an answer for a pair of relations but that isn't what this is. Note also that in any kind of aggregation push-down we must be sure that the function is well-defined and that the FDW is on the hook to ensure that the returned data is the same as if we ran the same aggregate function locally, otherwise the results of a query might differ based on if the aggregate was fired locally or remotely (which could be influenced by costing- eg: the size of the relation or its statistics). I'm motivated to implement GPU acceleration feature that works transparently for application. Thus, it has to be capable on regular tables, because most of application stores data on regular tables, not foreign ones. You want to persist that data in the GPU across multiple calls though, which makes it unlike any kind of regular PG table and much more like some foreign table. Perhaps the data is initially loaded from a local table and then updated on the GPU card in some way when the 'real' table is updated, but neither of those makes it a regular PG table. Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead. It's more than from time-to-time, it was for each use case in the given
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 8:07 PM, Craig Ringer cr...@2ndquadrant.com wrote: Please also highlight that any change will require a full table rewrite with an exclusive lock, so data type choices on larger tables may be hard to change later. It sure looks like they're binary-coercible to me: + CREATE CAST (hstore AS jsonb) + WITHOUT FUNCTION AS IMPLICIT; + + CREATE CAST (jsonb AS hstore) + WITHOUT FUNCTION AS IMPLICIT; Is this okay? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Yes, the part-1 patch provides a set of interface portion to interact between the backend code and extension code. Rest of part-2 and part-3 portions are contrib modules that implements its feature on top of custom-scan API. Just to come back to this- the other two contrib module patches, at least as I read over their initial submission, were *also* patching portions of backend code which it was apparently discovered that they needed. That's a good bit of my complaint regarding this approach. FDW's join pushing down is one of the valuable use-cases of this interface, but not all. As you might know, my motivation is to implement GPU acceleration feature on top of this interface, that offers alternative way to scan or join relations or potentially sort or aggregate. If you're looking to just use GPU acceleration for improving individual queries, I would think that Robert's work around backend workers would be a more appropriate way to go, with the ability to move a working set of data from shared buffers and on-disk representation of a relation over to the GPU's memory, perform the operation, and then copy the results back. If that's not possible or effective wrt performance, then I think we need to look at managing the external GPU memory as a foreign system through an FDW which happens to be updated through triggers or similar. The same could potentially be done for memcached systems, etc. regular PG tables, just to point out one issue, can be locked on a row-by-row basis, and we know exactly where in shared buffers to go hunt down the rows. How is that going to work here, if this is both a regular table and stored off in a GPU's memory across subsequent queries or even transactions? Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw side, it might be an idea to have common code (like a logic to check whether the both relations to be joined belongs to same foreign server) on the backend side as something like a gateway of them. Yes, that's what I was suggesting above- we should be asking the FDWs on a case-by-case basis how to cost out the join between foreign tables which they are responsible for. Asking two different FDWs servers to cost out a join between their tables doesn't make any sense to me. As an aside, what should be the scope of FDW interface? In my understanding, it allows extension to implement something on behalf of a particular data structure being declared with CREATE FOREIGN TABLE. That's where it is today, but certainly not our end goal. In other words, extension's responsibility is to generate a view of something according to PostgreSQL' internal data structure, instead of the object itself. The result of the FDW call needs to be something which PG understands and can work with, otherwise we wouldn't be able to, say, run PL/pgsql code on the result, or pass it into some other aggregate which we decided was cheaper to run locally. Being able to push down aggregates to the remote side of an FDW certainly fits in quite well with that. On the other hands, custom-scan interface allows extensions to implement alternative methods to scan or join particular relations, but it is not a role to perform as a target being referenced in queries. In other words, it is methods to access objects. The custom-scan interface still needs to produce something according to PG's internal data structures, so it's not clear to me where you're going with this. It is natural both features are similar because both of them intends extensions to hook the planner and executor, however, its purpose is different. I disagree as I don't really view FDWs as hooks. A hook is more like a trigger- sure, you can modify the data in transit, or throw an error if you see an issue, but you don't get to redefine the world and throw out what the planner or optimizer knows about the rest of what is going on in the query. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb and nested hstore
* Peter Geoghegan (p...@heroku.com) wrote: On Tue, Feb 25, 2014 at 8:07 PM, Craig Ringer cr...@2ndquadrant.com wrote: Please also highlight that any change will require a full table rewrite with an exclusive lock, so data type choices on larger tables may be hard to change later. It sure looks like they're binary-coercible to me: + CREATE CAST (hstore AS jsonb) + WITHOUT FUNCTION AS IMPLICIT; + + CREATE CAST (jsonb AS hstore) + WITHOUT FUNCTION AS IMPLICIT; Is this okay? Err, I'm not following this thread all *that* closely, but I was pretty sure the issue was json vs. jsonb, and I'd be mighty confused as to wtf was going on if those were binary-coercible... Thanks, Stephen signature.asc Description: Digital signature
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose. It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. I'm not following this logic at all- how are you defining foreign from regular? Certainly, in-memory-only tables which are sitting out in some non-persistent GPU memory aren't regular by any PG definition. Perhaps you can't make ForeignScan suddenly work as a join-node replacement, but I've not seen where anyone has proposed that (directly- I've implied it on occation where a remote view can be used, but that's not the same thing as having proper push-down support for joins). This regular one means usual tables. Even though custom implementation may reference self-managed in-memory cache instead of raw heap, the table pointed in user's query shall be a usual table. In the past, Hanada-san had proposed an enhancement of FDW to support remote-join but eventually rejected. I'd like to define these features are designed for individual purpose. My previous complaint about this patch set has been precisely that each piece seems to be custom-built and every patch needs more and more backend changes. If every time someone wants to do something with this CustomScan API, they need changes made to the backend code, then it's not a generally useful external API. We really don't want to define such an external API as then we have to deal with backwards compatibility, particularly when it's all specialized to specific use cases which are all different. The changes to backend are just for convenient. We may be able to implement functions to translate Bitmapset from/to cstring form in postgres_fdw, does it make sense to maintain individually? I thought these functions were useful to have in the backend commonly, but is not a fundamental functionality lacks of the custom-scan interface. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. There's certainly nothing in the FDW API which requires that the remote side have an internal heap representation, as evidenced by the various FDWs which already exist and certainly are not any kind of 'normal' heap. Every query against the foriegn relation goes through the FDW API and can end up returning whatever the FDW author decides is appropriate to return at that time, as long as it matches the tuple description- which is absolutely necessary for any kind of sanity, imv. Yes. It's my understanding for the role of FDW driver. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I can see the usefulness in being able to push down aggregates or other function-type calls to the remote side of an FDW and would love to see work done along those lines, along with the ability to push down joins to remote systems- but I'm not convinced that the claimed flexibility with the CustomScan API is there, given the need to continue modifying the backend code for each use-case, nor that there are particularly new and inventive ways of saying find me all the cases where set X overlaps with set Y. I'm certainly open to the idea that we could have an FDW API which allows us to ask exactly that question and let the remote side cost it out and give us an answer for a pair of relations but that isn't what this is. Note also that in any kind of aggregation push-down we must be sure that the function is well-defined and that the FDW is on the hook to ensure that the returned data is the same as if we ran the same aggregate function locally, otherwise the results of a query might differ based on if the aggregate was fired locally or remotely (which could be influenced by costing- eg: the size of the relation or its statistics). I can also understand the usefulness of join or aggregation into the remote side in case of foreign table reference. In similar way, it is also useful if we can push these CPU intensive operations into co-processors on regular table references. As I mentioned above,
Re: [HACKERS] jsonb and nested hstore
On 02/25/2014 08:54 PM, Josh Berkus wrote: That's called a straw man argument, Robert. Me: We should recommend that people use jsonb unless they have a specific reason for using json. We could also make the opposite argument - people use json unless they have a specific reason for using jsonb. btw, there is one more thing about JSON which I recently learned - a lot of JavaScript people actually expect the JSON binary form to retain field order It is not in any specs, but nevertheless all major imlementations do it and some code depends on it. IIRC, this behaviour is currently also met only by json and not by jsonb. Merlin: We should present them side-by-side with a complex comparison. Robert: Josh wants to junk all relational data and use only jsonb! I mean, really, WTF? Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Feb 25, 2014, at 1:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote: It is not in any specs, but nevertheless all major imlementations do it and some code depends on it. I have no doubt that some code depends on it, but all major implementations is too strong a statement. BSON, in particular, does not have stable field order. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers