Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Heikki Linnakangas

On 02/25/2014 08:23 AM, Haribabu Kommi wrote:

It's regarding a Todo item of Bit data type header reduction in some
cases. The header contains two parts. 1)  The varlena header is
automatically converted to 1 byte header from 4 bytes in case of small
data. 2) The bit length header called bit_len to store the actual bit
length which is of 4 bytes in size. I want to reduce this bit_len size to 1
byte in some cases as similar to varlena header. With this change the size
of the column reduced by 3 bytes, thus shows very good decrease in disk
usage.

I am planning to the change it based on the total length of bits data. If
the number of bits is less than 0x7F then one byte header can be chosen
instead of 4 byte header. During reading of the data, the similar way it
can be calculated.


Since we're designing a new format, how about using bit padding instead 
of an explicit length field? Add one 1-bit to the end of the bit data, 
followed by zeros. That's even more compact. See 
https://en.wikipedia.org/wiki/Padding_%28cryptography%29#Bit_padding



The problem I am thinking is, as it can cause problems to old databases
having bit columns when they
upgrade to a newer version without using pg_dumpall. Is there anyway to
handle this problem? Or Is there any better way to handle the problem
itself? please let me know your suggestions.


On a little-endian system, you could easily use the most-significant bit 
(sign bit) of the bit_len field to indicate that it's a new-format datum:


/*
 * Little-endian
 */
typedef struct
{
int32   vl_len_;
int32   bit_len;
bits8   bit_dat[1];
} VarBit_Old;

typedef struct
{
int32   vl_len_;
uint8   bit_len;
bits8   bit_dat[1]; /* var len */
} VarBit_New;

#define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len  0x80)

On a big-endian system that's more difficult, because the MSB would not 
be the first byte of the field. You could still make it work if the new 
format would have the length byte in fourth byte of the Datum. Something 
like this:


/*
 * Big-endian
 */
typedef struct
{
int32   vl_len_;
int32   bit_len;
bits8   bit_dat[1];
} VarBit_Old;

typedef struct
{
int32   vl_len_;
bits8   bit_dat_first_three[3];
uint8   bit_len;
bits8   bit_dat_rest[1]; /* var len */
} VarBit_New;

#define IS_NEW_FORMAT(((VarBit_New *) x)-bit_len  0x80)

That's not a complete solution yet, you also need a special case for a 
new-format value with less than 4 bytes of bits. Such a field would 
presumably not have the bit_len field at the fourth byte, because it 
would be shorter than that, but you could distinguish it by looking at 
the length in the varsize header. So in big-endian, you'd have one more 
format for very small varbits:


/* Big-endian, less than 4 bytes of bits data */
typedef struct
{
int32   vl_len_;
uint8   bit_len;
bits8   bit_dat[1]; /* var len */
} VarBit_New_Tiny;

The same basic idea also works with bit padding.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Kyotaro HORIGUCHI
Hello.

May I mark this patch as Ready for Committer by myself since
this was already marked so in last CF3 and have had no objection
or new suggestion in this CF4 for more than a month?

At Tue, 14 Jan 2014 18:08:10 +0900, Kyotaro HORIGUCHI wrote
 Hello, since CF4 is already closed but this patch ramains marked
   CF3
 as 'Ready for Committer', please let me re-post the latest
 version for CF4 to get rid of vanishing :-p
 

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype

2014-02-25 Thread Heikki Linnakangas

On 02/25/2014 06:52 AM, Michael Paquier wrote:

On Tue, Feb 25, 2014 at 10:02 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

(Not that it's this patch' responsibility to do that.)

Definitely a different patch.

Don't you think that this would break existing applications?


I hope there are no applications relying on pageinspect. It's a very 
low-level tool. Or if someone has written a nice GUI around it, I'd like 
to hear about it so that I can start using it :-).



I see more flexibility to keep them as they are now, as integers,
users can always tune their queries to do this post-processing with
to_hex for them as they've (always?) been doing.


Agreed. With an int4, you can more easily check for a particular bit etc.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Ashutosh Bapat
On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Folks,

 Let me remind the custom-scan patches; that is a basis feature of
 remote join of postgres_fdw, cache-only scan, (upcoming) GPU
 acceleration feature or various alternative ways to scan/join relations.
 Unfortunately, small amount of discussion we could have in this commit
 fest, even though Hanada-san volunteered to move the patches into
 ready for committer state at the CF-Nov.


Sorry for jumping into this late.
Instead of custom node, it might be better idea to improve FDW
infrastructure to push join. For the starters, is it possible for the
custom scan node hooks to create a ForeignScan node? In general, I think,
it might be better for the custom scan hooks to create existing nodes if
they serve the purpose.



 Prior to time-up, I'd like to ask hacker's opinion about its potential
 arguable points (from my standpoint) if it needs to be fixed up.
 One is hook definition to add alternative join path, and the other one
 is a special varno when a custom scan replace a join node.
 I'd like to see your opinion about them while we still have to change
 the design if needed.

 (1) Interface to add alternative paths in addition to built-in join paths

 This patch adds add_join_path_hook on add_paths_to_joinrel to allow
 extensions to provide alternative scan path in addition to the built-in
 join paths. Custom-scan path being added is assumed to perform to scan
 on a (virtual) relation that is a result set of joining relations.
 My concern is its arguments to be pushed. This hook is declared as follows:

 /* Hook for plugins to add custom join path, in addition to default ones */
 typedef void (*add_join_path_hook_type)(PlannerInfo *root,
 RelOptInfo *joinrel,
 RelOptInfo *outerrel,
 RelOptInfo *innerrel,
 JoinType jointype,
 SpecialJoinInfo *sjinfo,
 List *restrictlist,
 List *mergeclause_list,
 SemiAntiJoinFactors *semifactors,
 Relids param_source_rels,
 Relids extra_lateral_rels);
 extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

 Likely, its arguments upper than restrictlist should be informed to
 extensions,
 because these are also arguments of add_paths_to_joinrel().
 However, I'm not 100% certain how about other arguments should be informed.
 Probably, it makes sense to inform param_source_rels and extra_lateral_rels
 to check whether the path is sensible for parameterized paths.
 On the other hand, I doubt whether mergeclause_list is usuful to deliver.
 (It may make sense if someone tries to implement their own merge-join
 implementation??)

 I'd like to seem idea to improve the current interface specification.


Since a custom node is open implementation, it will be important to pass as
much information down to the hooks as possible; lest the hooks will be
constrained.  Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?



 (2) CUSTOM_VAR for special Var reference

 @@ -134,6 +134,7 @@ typedef struct Expr
  #defineINNER_VAR   65000   /* reference to inner subplan */
  #defineOUTER_VAR   65001   /* reference to outer subplan */
  #defineINDEX_VAR   65002   /* reference to index column */
 +#defineCUSTOM_VAR  65003   /* reference to custom column */

 I newly added CUSTOM_VAR to handle a case when custom-scan override
 join relations.
 Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
 ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
 instead of built-in joins, because its tuples being fetched are usually
 stored on the ecxt_scantuple, thus Var-nodes also need to have right
 varno neither inner nor outer.

 SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
 extensions to rewrite Var-nodes within custom-scan node to indicate
 ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
 For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
 this node reference the third attribute of the tuple in ecxt_scantuple.
 I think it is a reasonable solution, however, I'm not 100% certain
 whether people have more graceful idea to implement it.

 If you have comments around above two topic, or others, please give
 your ideas.

 Thanks,

 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
  

Re: [HACKERS] Trigger information for auto_explain.

2014-02-25 Thread Kyotaro HORIGUCHI
Hi, I saw this patch has been moved into committed patches but
only the first part (0001_..) for the core is committed as of
32001ab but the rest for extension side seem not to have been
committed.

Would you mind taking a look on that, Álvaro?

regards,

At Wed, 22 Jan 2014 17:28:27 +0900, Kyotaro HORIGUCHI wrote
me Hello, I came back with doc patch and revised 0002 patch.
me 
meI think documentation is the only thing that stops this patch to be
mecommitable... can you add it?
me   
me   Agreed.  I have pushed patch 0001 for now.
me  
me  Thank you, I'll put it sooner.
me 
me I found the default setting for log_triggers was true in the last
me patch while writing doc but it's better be false ragarding
me backward compatibility. The 0002 patch attached has been changed
me there.
me 
me - 0002_auto_explain_triggers_v2_20140122.patch
me 
me   default value for log_triggers from 'true' to 'false'. As added
me   documents says.
me 
me - 0003_auto_explain_triggers_doc_v1_20140122.patch
me 
me   documentation.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype

2014-02-25 Thread Greg Stark
On 25 Feb 2014 08:54, Heikki Linnakangas hlinnakan...@vmware.com wrote:


 I hope there are no applications relying on pageinspect. It's a very
low-level tool. Or if someone has written a nice GUI around it, I'd like to
hear about it so that I can start using it :-).


 I see more flexibility to keep them as they are now, as integers,
 users can always tune their queries to do this post-processing with
 to_hex for them as they've (always?) been doing.


 Agreed. With an int4, you can more easily check for a particular bit etc.

What about making it a bit() column?

What I would love to see is a function added to page inspect that takes
these two fields and returns a list of symbolic names or perhaps a tuple of
booleans.


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Kouhei Kaigai
 Sorry for jumping into this late.
 
 Instead of custom node, it might be better idea to improve FDW infrastructure
 to push join. For the starters, is it possible for the custom scan node
 hooks to create a ForeignScan node? In general, I think, it might be better
 for the custom scan hooks to create existing nodes if they serve the purpose.

It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

 Since a custom node is open implementation, it will be important to pass
 as much information down to the hooks as possible; lest the hooks will be
 constrained.  Since the functions signatures within the planner, optimizer
 will change from time to time, so the custom node hook signatures will need
 to change from time to time. That might turn out to be maintenance overhead.
 
Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

Below is the information we cannot reproduce:
 - PlannerInfo *root
 - RelOptInfo *joinrel
 - RelOptInfo *outerrel
 - RelOptInfo *innerrel
 - JoinType jointype
 - SpecialJoinInfo *sjinfo
 - List *restrictlist

Below is the information we can reproduce but complicated steps:
 - List *mergeclause_list
 - bool mergejoin_allow
 - Relids param_source_rels
 - Relids extra_lateral_rels

Below is the information we can reproduce easily:
 - SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.


 BTW, is it a good idea for custom nodes to also affect other paths like
 append, group etc.? Will it need separate hooks for each of those?

Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the hook mean.)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Ashutosh Bapat [mailto:ashutosh.ba...@enterprisedb.com]
 Sent: Tuesday, February 25, 2014 5:59 PM
 To: Kohei KaiGai
 Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
 Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
 Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
 
 
 
 
 On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 
 
   Folks,
 
   Let me remind the custom-scan patches; that is a basis feature of
   remote join of postgres_fdw, cache-only scan, (upcoming) GPU
   acceleration feature or various alternative ways to scan/join
 relations.
   Unfortunately, small amount of discussion we could have in this
 commit
   fest, even though Hanada-san volunteered to move the patches into
   ready for committer state at the CF-Nov.
 
 
 
 Sorry for jumping into this late.
 
 Instead of custom node, it might be better idea to improve FDW infrastructure
 to push join. For the starters, is it possible for the custom scan node
 hooks to create a ForeignScan node? In general, I think, it might be better
 for the custom scan hooks to create existing nodes if they serve the purpose.
 
 
 
 
   Prior to time-up, I'd like to ask hacker's opinion about its
 potential
   arguable points (from my standpoint) if it needs to be fixed up.
   One is hook definition to add alternative join path, and the other
 one
   is a special varno when a custom scan replace a 

[HACKERS] [ISSUE] pg_dump: schema with OID 0 does not exist

2014-02-25 Thread Prakash Itnal
Hi,

Recently we observed below errors while taking dump after upgrading from
9.0.13 to 9.1.9.

pg_dump: schema with OID 0 does not exist

I referred similar issues reported previously (
http://www.postgresql.org/message-id/CAGWYGjXRJj=zugejv0ckvn4zf9hb92q+7e3aqfcvbgbmb9z...@mail.gmail.com)
and get issue resolved after identifying and inserting some of the missing
rows from *pg_opclass* table. Below are the rows that are missed after
upgrade (from *pg_opclass *table):

   405 | aclitem_ops |   11 |   10 |  2235
|  1033 | t  |  0

   783 | box_ops |   11 |   10 |  2593 |
   603 | t  |  0

   783 | point_ops   |   11 |   10 |  1029
|   600 | t  |603

   783 | poly_ops|   11 |   10 |  2594
|   604 | t  |603

   783 | circle_ops  |   11 |   10 |  2595
|   718 | t  |603

  2742 | _int4_ops   |   11 |   10 |  2745
|  1007 | t  | 23

  2742 | _text_ops   |   11 |   10 |  2745
|  1009 | t  | 25

  2742 | _abstime_ops|   11 |   10 |  2745
|  1023 | t  |702


Can some one help me to understand whether it is an issue? If not how and
why these rows got disappeared after upgrade?

Since we have an fully automated environment we do not encourage manual
intervention. So I am trying to understand how can we handle these issues.

-- 
Cheers,
Prakash


Re: [HACKERS] often PREPARE can generate high load (and sometimes minutes long unavailability)

2014-02-25 Thread Pavel Stehule
Hello


2014-02-24 21:31 GMT+01:00 Jeff Janes jeff.ja...@gmail.com:

 On Mon, Feb 24, 2014 at 7:02 AM, Pavel Stehule pavel.steh...@gmail.comwrote:




 2014-02-23 21:32 GMT+01:00 Andres Freund and...@2ndquadrant.com:

 Hi,

 On 2014-02-23 20:04:39 +0100, Pavel Stehule wrote:
  There is relative few very long ProcArrayLocks lwlocks
 
  This issue is very pathologic on fast computers with more than 8 CPU.
 This
  issue was detected after migration from 8.4 to 9.2. (but tested with
 same
  result on 9.0)  I see it on devel 9.4 today actualized.
 
  When I moved PREPARE from cycle, then described issues is gone. But
 when I
  use a EXECUTE IMMEDIATELY, then the issue is back. So it looks it is
  related to planner, ...

 In addition to the issue Jeff mentioned, I'd suggest trying the same
 workload with repeatable read. That can do *wonders* because of the
 reduced number of snapshots.


 I tested it, and it doesn't help.

 Is there some patch, that I can test related to this issue?


 This is the one that I was referring to:

 http://www.postgresql.org/message-id/11927.1384199...@sss.pgh.pa.us


I tested this patch with  great success. Waiting on ProcArrayLocks are off.
Throughput is expected.

For described use case it is seriously interesting.

Regards

Pavel


light weight locks
lockname   mode  countavg
DynamicLocks  Exclusive   8055   5003
DynamicLocks Shared   1666 50
LockMgrLocks  Exclusive129 36
 IndividualLock   Exclusive 34 48
 IndividualLock  Shared 21  7
 BufFreelistLock  Exclusive 12  8
WALWriteLock  Exclusive  1  38194
   ProcArrayLock Shared  1  8



 Cheers,

 Jeff





Re: [HACKERS] inherit support for foreign tables

2014-02-25 Thread Etsuro Fujita
In addition to an issue pointed out recently by Horiguchi-san, I've
found there is another issue we have to discuss.  That is, we can't
build any parameterized Append paths for an inheritance tree that
contains at least one foreign table, in set_append_rel_pathlist().  This
is because the patch doesn't take into consideration
reparameterization for paths for a foreign table, which attempts to
modify these paths to have greater parameterization so that each child
path in the inheritance tree can have the exact same parameterization.
To do so, I think we would need to add code for the foreign-table case
to reparameterize_path().  And I think we should introduce a new FDW
routine, say ReparameterizeForeignPath() because the processing would be
performed best by the FDW itself.

Comments are welcome!

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Ashutosh Bapat
On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  Sorry for jumping into this late.
 
  Instead of custom node, it might be better idea to improve FDW
 infrastructure
  to push join. For the starters, is it possible for the custom scan node
  hooks to create a ForeignScan node? In general, I think, it might be
 better
  for the custom scan hooks to create existing nodes if they serve the
 purpose.
 
 It does not work well because existing FDW infrastructure is designed to
 perform on foreign tables, not regular tables. Probably, it needs to revise
 much our assumption around the background code, if we re-define the purpose
 of FDW infrastructure. For example, ForeignScan is expected to return a
 tuple
 according to the TupleDesc that is exactly same with table definition.
 It does not fit the requirement if we replace a join-node by ForeignScan
 because its TupleDesc of joined relations is not predefined.


If one wants to push joins, aggregates, grouping across to other data
sources capable of handling them, that will need to change. But, at the
same time, letting custom scan node being able to decide that doesn't seem
to be a very good idea. Although, through custom scan nodes, we can see the
potential in adding these features.


 I'd like to define these features are designed for individual purpose.
 FDW is designed to intermediate an external data source and internal heap
 representation according to foreign table definition. In other words, its
 role is to generate contents of predefined database object on the fly.
 On the other hands, custom-scan is designed to implement alternative ways
 to scan / join relations in addition to the methods supported by built-in
 feature.

 I'm motivated to implement GPU acceleration feature that works
 transparently
 for application. Thus, it has to be capable on regular tables, because most
 of application stores data on regular tables, not foreign ones.


It looks like my description was misleading. In some cases, it might be
possible that the ultimate functionality that a particular instantiation of
custom node scan is already available as a Plan node in PG, but PG
optimizer is not able to optimize the operation that way. In such case,
custom scan node infrastructure should produce the corresponding Path node
and not implement that functionality itself.


  Since a custom node is open implementation, it will be important to pass
  as much information down to the hooks as possible; lest the hooks will be
  constrained.  Since the functions signatures within the planner,
 optimizer
  will change from time to time, so the custom node hook signatures will
 need
  to change from time to time. That might turn out to be maintenance
 overhead.
 
 Yes. You are also right. But it also makes maintenance overhead if hook has
 many arguments nobody uses.
 Probably, it makes sense to list up the arguments that cannot be reproduced
 from other information, can be reproduced but complicated steps, and can be
 reproduced easily.

 Below is the information we cannot reproduce:
  - PlannerInfo *root
  - RelOptInfo *joinrel
  - RelOptInfo *outerrel
  - RelOptInfo *innerrel
  - JoinType jointype
  - SpecialJoinInfo *sjinfo
  - List *restrictlist


Most of this information is available through corresponding RelOptInfo, or
we should make RelOptInfo contain all the information related to every
relation required to be computed during the query. So, any function which
creates paths can just take that RelOptInfo as an argument and produce the
path/s. That way there is lesser chance that the function signatures change.


 Below is the information we can reproduce but complicated steps:
  - List *mergeclause_list
  - bool mergejoin_allow
  - Relids param_source_rels
  - Relids extra_lateral_rels

 Below is the information we can reproduce easily:
  - SemiAntiJoinFactors *semifactors

 I think, the first two categories or the first category (if functions to
 reproduce the second group is exposed) should be informed to extension,
 however, priority of the third group is not high.


  BTW, is it a good idea for custom nodes to also affect other paths like
  append, group etc.? Will it need separate hooks for each of those?
 
 Yes. I plan to support above plan node, in addition to scan / join only.
 The custom-scan node is thin abstraction towards general executor behavior,
 so I believe it is not hard to enhance this node, without new plan node
 for each of them.
 Of course, it will need separate hook to add alternative path on the
 planner
 stage, but no individual plan nodes. (Sorry, it was unclear for me what
 does the hook mean.)


If we represent all the operation like grouping, sorting, aggregation, as
some sort of relation, we can create paths for each of the relation like we
do (I am heavily borrowing from Tom's idea of pathifying those operations).
We will need much lesser hooks in custom scan node.

BTW, from the patch, I do not see this change to be light 

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-02-25 Thread Florian Pflug
On Feb24, 2014, at 17:50 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 20 February 2014 01:48, Florian Pflug f...@phlo.org wrote:
 On Jan29, 2014, at 13:45 , Florian Pflug f...@phlo.org wrote:
 In fact, I'm
 currently leaning towards just forbidding non-strict forward transition
 function with strict inverses, and adding non-NULL counters to the
 aggregates that then require them. It's really only the SUM() aggregates
 that are affected by this, I think.
 
 I finally got around to doing that, and the results aren't too bad. The
 attached patches required that the strictness settings of the forward and
 reverse transition functions agree, and employ exactly the same NULL-skipping
 logic we always had.
 
 The only aggregates seriously affected by that change were SUM(int2) and
 SUM(int4).
 
 I haven't looked at this in any detail yet, but that seems much neater
 to me. It seems perfectly sensible that the forward and inverse
 transition functions should have the same strictness settings, and
 enforcing that keeps the logic simple, as well as hopefully making it
 easier to document.

Good to hear that you agree! I'll try to find some time to update the docs. 

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typemode for variable types

2014-02-25 Thread Greg Stark
Also keep in mind that the system doesn't always retain the typmod. So the
datum should be possible to interpret without the typmod.  Incidental
effects such as length limits or precision displayed are ok but the meaning
shouldn't be changed.

-- 
greg
On 24 Feb 2014 20:34, Tom Lane t...@sss.pgh.pa.us wrote:

 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Bruce Momjian escribió:
  Well, typmods are type-specific, so there is no official way to
  calculate it.  I would look at how an existing type uses typmod and copy
  that.

  Our system is pretty neat.  See a complex example here:
 
 https://github.com/postgis/postgis/blob/svn-trunk/postgis/gserialized_typmod.c

 One other point is that if you do consult the varchar functions as
 an example, be aware that there's an offset of 4 in their definition
 of the typmod (eg, for varchar(3) the stored typmod is 7).  This is
 entirely for legacy reasons so there's no good reason to duplicate it
 in a new custom-made type.  Except for the rule that negative values
 mean unspecified typmod (which you have to support), you can
 define the contents of the typmod value however you want.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Same double precision operations, different results

2014-02-25 Thread Daniel Vázquez
Thx Tom!!

Yep, I focused on all trigonometric functions take arguments and return
values of type double precision. Looking for the error I lost focus on
numeric values directly on the select and diff inner calculations.

I think best approach will be maintain double precision on trigonometric
calculations for faster and cast to numeric before acos operation.

Thank you man!




2014-02-13 18:26 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 =?ISO-8859-1?Q?Daniel_V=E1zquez?= daniel2d2...@gmail.com writes:
  Please can someone tell me why this behavior? Thx!!

 You're confusing numeric and double precision; in particular this
 calculation is *not* being done in float8, but numeric:

  # select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 *
  0.642555686986733) calc;

  (*) Why this calculation produces 1 and not
 0.999633651488135693

 Actually, it's not producing 1, but a smidgen more:

 regression=# set extra_float_digits TO 3;
 SET
 regression=# select ( cast (
cos(radians(39.9826557))
* cos(radians(39.9826557))
* cos(radians(-0.0477312004383) - radians(-0.0477312004383))
+ sin(radians(39.9826557))
* sin(radians(39.9826557)) as double precision )
  );
float8
 -
  1.00022
 (1 row)

 You've got roundoff error either way, but this way happens to be in the
 direction that makes acos() complain.

 regards, tom lane



Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins

2014-02-25 Thread Andreas 'ads' Scherbaum


Hi,

On 01/28/2014 06:46 PM, Atri Sharma wrote:

On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown t...@linux.com
mailto:t...@linux.com wrote:

Hi all,

Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.

I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.

And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?

Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?

I would like to bring up the addition to MADLIB algorithms again this year.


I've spoken with the MADlib team at goivotal and they are ok to support 
this proposal. Therefore I offer to mentor this.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak

2014-02-25 Thread Sergey Burladyan
Hello, All!

eshkin...@gmail.com writes:

 create function perl_test(IN data text, OUT v1 text, OUT v2 integer, OUT v3
 integer, OUT v4 json, OUT v5 json)
   returns record as
 $BODY$

 use strict;
 use warnings;

 my $res-{'v1'} = 'content';

 return $res;

 $BODY$
   language plperlu volatile strict;

 test case:
 select count(perl_test('')) from generate_series(1, 100);

It looks like I found the problem, Perl use reference count and something that
is called Mortal for memory management.  As I understand it, mortal is free
after FREETMPS. Plperl call FREETMPS in plperl_call_perl_func() but after it,
plperl ask perl interpreter again for new mortal SV variables, for example, in
hek2cstr from plperl_sv_to_datum, and this new SV is newer freed.

I experiment with this patch, and it fix memory leak in my case, but patch
is incomplete. It does not free on error and fix only plperl_func_handler,
plperl_trigger_handler and may be plperl_inline_handler must be also fixed.

Patch for REL9_2_STABLE.

without patch:
  PIDVSZ   RSS
 2503  74112  7740
 2503 152928 86860
 2503 232208 165836
 2503 310732 244508
 2503 389264 323032

with patch:
  PIDVSZ   RSS
 4322  74112  7740
 4322  74380  8340
 4322  74380  8340
 4322  74380  8340
 4322  74380  8340


diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 49d50c4..9c9874d 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -2173,6 +2173,9 @@ plperl_func_handler(PG_FUNCTION_ARGS)
 	ReturnSetInfo *rsi;
 	ErrorContextCallback pl_error_context;
 
+	ENTER;
+	SAVETMPS;
+
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, could not connect to SPI manager);
 
@@ -2271,6 +2274,9 @@ plperl_func_handler(PG_FUNCTION_ARGS)
 
 	SvREFCNT_dec(perlret);
 
+	FREETMPS;
+	LEAVE;
+
 	return retval;
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Kohei KaiGai
2014-02-25 20:32 GMT+09:00 Ashutosh Bapat ashutosh.ba...@enterprisedb.com:

 On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  Sorry for jumping into this late.
 
  Instead of custom node, it might be better idea to improve FDW
  infrastructure
  to push join. For the starters, is it possible for the custom scan node
  hooks to create a ForeignScan node? In general, I think, it might be
  better
  for the custom scan hooks to create existing nodes if they serve the
  purpose.
 
 It does not work well because existing FDW infrastructure is designed to
 perform on foreign tables, not regular tables. Probably, it needs to
 revise
 much our assumption around the background code, if we re-define the
 purpose
 of FDW infrastructure. For example, ForeignScan is expected to return a
 tuple
 according to the TupleDesc that is exactly same with table definition.
 It does not fit the requirement if we replace a join-node by ForeignScan
 because its TupleDesc of joined relations is not predefined.


 If one wants to push joins, aggregates, grouping across to other data
 sources capable of handling them, that will need to change. But, at the same
 time, letting custom scan node being able to decide that doesn't seem to be
 a very good idea. Although, through custom scan nodes, we can see the
 potential in adding these features.

Of course, existing form of custom-scan node is designed to support
scan or join relations, as a first step. It will also need some enhancement
to support other class of execution node in the future version.
I'm not certain why it is problematic.

 I'd like to define these features are designed for individual purpose.
 FDW is designed to intermediate an external data source and internal heap
 representation according to foreign table definition. In other words, its
 role is to generate contents of predefined database object on the fly.
 On the other hands, custom-scan is designed to implement alternative ways
 to scan / join relations in addition to the methods supported by built-in
 feature.

 I'm motivated to implement GPU acceleration feature that works
 transparently
 for application. Thus, it has to be capable on regular tables, because
 most
 of application stores data on regular tables, not foreign ones.


 It looks like my description was misleading. In some cases, it might be
 possible that the ultimate functionality that a particular instantiation of
 custom node scan is already available as a Plan node in PG, but PG optimizer
 is not able to optimize the operation that way. In such case, custom scan
 node infrastructure should produce the corresponding Path node and not
 implement that functionality itself.

You are suggesting that CustomSort, CustomAgg, CustomAppend and
so on should be supported in the future version, for better integration with
the plan optimizer. Right?
It is probably a good idea if optimizer needs to identify Custom node
using node tag, rather than something others like custom-scan provider
name,
Right now, custom-scan feature focuses on optimization of relation scan
and join as its first scope, and does not need to identify the class of
corresponding Path node.
On the upthread of this discussion, I initially proposed to have separated
CustomScan and CustomJoin node, however, our consensus was that
CustomScan can perform as like a scan on the result set of joined
relations, so I dropped multiple node types from the first version.

  Since a custom node is open implementation, it will be important to pass
  as much information down to the hooks as possible; lest the hooks will
  be
  constrained.  Since the functions signatures within the planner,
  optimizer
  will change from time to time, so the custom node hook signatures will
  need
  to change from time to time. That might turn out to be maintenance
  overhead.
 
 Yes. You are also right. But it also makes maintenance overhead if hook
 has
 many arguments nobody uses.
 Probably, it makes sense to list up the arguments that cannot be
 reproduced
 from other information, can be reproduced but complicated steps, and can
 be
 reproduced easily.

 Below is the information we cannot reproduce:
  - PlannerInfo *root
  - RelOptInfo *joinrel
  - RelOptInfo *outerrel
  - RelOptInfo *innerrel
  - JoinType jointype
  - SpecialJoinInfo *sjinfo
  - List *restrictlist


 Most of this information is available through corresponding RelOptInfo, or
 we should make RelOptInfo contain all the information related to every
 relation required to be computed during the query. So, any function which
 creates paths can just take that RelOptInfo as an argument and produce the
 path/s. That way there is lesser chance that the function signatures change.

Uhmm It is inconvenience to write extensions. I want the variables
in the first and second groups being delivered to the hook, even though
it may have minor modification in the future release.
Relations join is one of the heart of RDBMS, so I'd like 

Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 02/25/2014 08:23 AM, Haribabu Kommi wrote:
 It's regarding a Todo item of Bit data type header reduction in some
 cases. The header contains two parts. 1)  The varlena header is
 automatically converted to 1 byte header from 4 bytes in case of small
 data. 2) The bit length header called bit_len to store the actual bit
 length which is of 4 bytes in size. I want to reduce this bit_len size to 1
 byte in some cases as similar to varlena header. With this change the size
 of the column reduced by 3 bytes, thus shows very good decrease in disk
 usage.

 [ various contorted schemes for doing that backward-compatibly ]

TBH, this sounds like a huge amount of effort and a significant risk of
new bugs in return for not darn much.  Who uses bit values anyway?
They were removed from the SQL spec more than 10 years ago.  And of that
population, who cares about a byte or two per value?  The field demand
for this is not only zero, it's probably negative.  (The thread referenced
by the TODO entry shows no evidence of user demand.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-advocacy] GSoC 2014 - mentors, students and admins

2014-02-25 Thread Atri Sharma
Awesome. I can be an assistant mentor for this one is possible or I could
mentor some other project.

On Tuesday, February 25, 2014, Andreas 'ads' Scherbaum 
adsm...@wars-nicht.de wrote:


 Hi,

 On 01/28/2014 06:46 PM, Atri Sharma wrote:

 On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown t...@linux.com
 mailto:t...@linux.com wrote:

 Hi all,

 Application to Google Summer of Code 2014 can be made as of next
 Monday (3rd Feb), and then there will be a 12 day window in which to
 submit an application.

 I'd like to gauge interest from both mentors and students as to
 whether we'll want to do this.

 And I'd be fine with being admin again this year, unless there's
 anyone else who would like to take up the mantle?

 Who would be up for mentoring this year?  And are there any project
 ideas folk would like to suggest?

 I would like to bring up the addition to MADLIB algorithms again this
 year.


 I've spoken with the MADlib team at goivotal and they are ok to support
 this proposal. Therefore I offer to mentor this.


 Regards,

 --
 Andreas 'ads' Scherbaum
 German PostgreSQL User Group
 European PostgreSQL User Group - Board of Directors
 Volunteer Regional Contact, Germany - PostgreSQL Project



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Merlin Moncure
On Mon, Feb 24, 2014 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 02/24/2014 11:06 AM, Merlin Moncure wrote:

 On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com
 wrote:

 On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com
 wrote:

 I still find the phrasing as jsonb is more efficient for most
 purposes to be a bit off  Basically, the text json type is faster for
 serialization/deserialization pattern (not just document preservation)
 and jsonb is preferred when storing json and doing repeated
 subdocument accesses.

 Hm, I'm going to withdraw that.  I had done some testing of simple
 deserialization (cast to text and the like) and noted that jsonb was
 as much as 5x slower.  However, I just did some checking on
 json[b]_populate_recordset though and it's pretty much a wash.

 [sorry for noise on this].

 Here's the use case coverage as I see it today:

 CASE:jsonjsonb hstore
 Static document: yes  poor  poor
 Precise document:yes  nono
 Serialization:   yes  nono
 Deserialization: poor***  yes*  no
 Repeated Access: poor yes   yes
 Manipulation:no   no**  yes
 GIST/GIN searching:  no   no**  yes

 notes:
 * jsonb gets 'yes' for deserialization assuming andrew's 'two level'
 deserialization fix goes in (otherwise 'poor').
 ** jsonb can't do this today, but presumably will be able to soon
 *** 'poor' unless json type also gets the deserialization fix, then 'yes'.
  hstore can deserialize hstore format, but will rely on json/jsonb
 for deserializing json

 'Static document' represents edge cases where the json is opaque to
 the database but performance -- for example large map polygons.
 'Precise document' represents cases where whitespace or key order is
 important.

 Peter asked upthread how to access the various features.  Well, today,
 it basically means a bit of nimble casting to different structures
 depending on which particular features are important to you, which
 IMNSHO is not bad at all as long as we understand that most people who
 rely on jsonb will also need hstore for its searching and operators.
 Down the line when hstore and jsonb are more flushed out it's going to
 come down to an API style choice.

 Frankly, a lot of the above doesn't make much sense to me. WTF is
 Manipulation'?

 Unless I see much more actual info on the tests being conducted it's just
 about impossible to comment. The performance assessment at this stage is
 simply anecdotal as far as I'm concerned.

Er, I wasn't making performance assessments (except in cases where it
was obvious like poor support for arbitrary access with json) , but
API coverage of use cases.  Manipulation I thought obvious: the
ability to manipulate the document (say, change some value to
something else): the nosql pattern.  through the API.  Neither json or
jsonb can do that at present...only hstore can.  jsonb cant't; it only
covers some of what json type currently covers (but some of the thing
it does cover is much faster).

On Mon, Feb 24, 2014 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote:
 Hm, I'm going to withdraw that.  I had done some testing of simple
 deserialization (cast to text and the like) and noted that jsonb was
 as much as 5x slower.  However, I just did some checking on
 json[b]_populate_recordset though and it's pretty much a wash.

 Aside from that, I want our docs to make a strong endorsement of using
 jsonb over json for most users.  jsonb will continue to be developed and
 improved in the future; it is very unlikely that json will.  Maybe
 that's what I should say rather than anything about efficiency.

I would hope that endorsement doesn't extend to misinforming users.
Moreover, json type is handling all serialization at present and will
continue to do so for some years.  In fact, in this release we got a
bunch of new very necessary enhancements (json_build) to
serialization!  You're trying to deprecate and enhance the type at the
same time!

The disconnect here is that your statements would be correct if the
only usage for the json type would be for storing data in json.
However, people (including myself) are doing lots of wonderful things
storing data in the traditional way and moving into and out of json in
queries and that, besides working better in the json type, is only
possible in json.  That might change in the future by figuring out a
way to cover json serialization cases through jsonb but that's not how
things work today, end of story.

Look, I definitely feel the frustration and weariness here in terms of
my critiquing the proposed API along with the other arguments I've
made.  Please understand that nobody wants this to go out the door
more than me if the objective is to lock in the API 'as is' then let's
be polite to our users and try to document various use cases and
what's good at what.

merlin


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] often PREPARE can generate high load (and sometimes minutes long unavailability)

2014-02-25 Thread Pavel Stehule
2014-02-25 11:29 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:

 Hello


 2014-02-24 21:31 GMT+01:00 Jeff Janes jeff.ja...@gmail.com:

 On Mon, Feb 24, 2014 at 7:02 AM, Pavel Stehule pavel.steh...@gmail.comwrote:




 2014-02-23 21:32 GMT+01:00 Andres Freund and...@2ndquadrant.com:

 Hi,

 On 2014-02-23 20:04:39 +0100, Pavel Stehule wrote:
  There is relative few very long ProcArrayLocks lwlocks
 
  This issue is very pathologic on fast computers with more than 8 CPU.
 This
  issue was detected after migration from 8.4 to 9.2. (but tested with
 same
  result on 9.0)  I see it on devel 9.4 today actualized.
 
  When I moved PREPARE from cycle, then described issues is gone. But
 when I
  use a EXECUTE IMMEDIATELY, then the issue is back. So it looks it is
  related to planner, ...

 In addition to the issue Jeff mentioned, I'd suggest trying the same
 workload with repeatable read. That can do *wonders* because of the
 reduced number of snapshots.


 I tested it, and it doesn't help.

 Is there some patch, that I can test related to this issue?


 This is the one that I was referring to:

 http://www.postgresql.org/message-id/11927.1384199...@sss.pgh.pa.us


 I tested this patch with  great success. Waiting on ProcArrayLocks are
 off. Throughput is expected.

 For described use case it is seriously interesting.


Here is a update for 9.4

Regards

Pavel



 Regards

 Pavel


 light weight locks
 lockname   mode  countavg
 DynamicLocks  Exclusive   8055   5003
 DynamicLocks Shared   1666 50
 LockMgrLocks  Exclusive129 36
  IndividualLock   Exclusive 34 48
  IndividualLock  Shared 21  7
  BufFreelistLock  Exclusive 12  8
 WALWriteLock  Exclusive  1  38194
ProcArrayLock Shared  1  8



 Cheers,

 Jeff




diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
new file mode 100644
index d525ca4..261473d
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*** get_actual_variable_range(PlannerInfo *r
*** 4958,4963 
--- 4958,4964 
  			HeapTuple	tup;
  			Datum		values[INDEX_MAX_KEYS];
  			bool		isnull[INDEX_MAX_KEYS];
+ 			SnapshotData	SnapshotDirty;
  
  			estate = CreateExecutorState();
  			econtext = GetPerTupleExprContext(estate);
*** get_actual_variable_range(PlannerInfo *r
*** 4980,4985 
--- 4981,4987 
  			slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel));
  			econtext-ecxt_scantuple = slot;
  			get_typlenbyval(vardata-atttype, typLen, typByVal);
+ 			InitDirtySnapshot(SnapshotDirty);
  
  			/* set up an IS NOT NULL scan key so that we ignore nulls */
  			ScanKeyEntryInitialize(scankeys[0],
*** get_actual_variable_range(PlannerInfo *r
*** 4997,5003 
  			if (min)
  			{
  index_scan = index_beginscan(heapRel, indexRel,
! 			 GetActiveSnapshot(), 1, 0);
  index_rescan(index_scan, scankeys, 1, NULL, 0);
  
  /* Fetch first tuple in sortop's direction */
--- 4999,5005 
  			if (min)
  			{
  index_scan = index_beginscan(heapRel, indexRel,
! 			 SnapshotDirty, 1, 0);
  index_rescan(index_scan, scankeys, 1, NULL, 0);
  
  /* Fetch first tuple in sortop's direction */
*** get_actual_variable_range(PlannerInfo *r
*** 5029,5035 
  			if (max  have_data)
  			{
  index_scan = index_beginscan(heapRel, indexRel,
! 			 GetActiveSnapshot(), 1, 0);
  index_rescan(index_scan, scankeys, 1, NULL, 0);
  
  /* Fetch first tuple in reverse direction */
--- 5031,5037 
  			if (max  have_data)
  			{
  index_scan = index_beginscan(heapRel, indexRel,
! 			 SnapshotDirty, 1, 0);
  index_rescan(index_scan, scankeys, 1, NULL, 0);
  
  /* Fetch first tuple in reverse direction */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Peter Eisentraut
On 1/30/14, 2:28 AM, Christian Kruse wrote:
 after I finally got documentation compilation working I updated the
 patch to be syntactically correct. You will find it attached.

I don't think we should be explaining the basics of OS memory management
in our documentation.  And if we did, we shouldn't copy it verbatim from
the Debian wiki without attribution.

I think this patch should be cut down to the paragraphs that cover the
actual configuration.

On a technical note, use xref instead of link for linking.
doc/src/sgml/README.links contains some information.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-25 Thread Andres Freund
On 2014-02-25 10:45:55 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote:
  In WalSndLoop() we do:
 
  wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
  WL_SOCKET_READABLE;
 
  if (pq_is_send_pending())
  wakeEvents |= WL_SOCKET_WRITEABLE;
  else if (wal_sender_timeout  0  !ping_sent)
  {
  ...
  if (GetCurrentTimestamp() = timeout)
  WalSndKeepalive(true);
  ...
 
  I think those two if's should simply be separate. There's no reason not
  to ask for a ping when we're writing. On a busy server that might be the
  case most of the time.
  The if (pq_is_send_pending()) should also be *after* sending the
  keepalive, after all, it might not immediately have been flushed as
  unlikely as that is.ackers
 
 I spent an inordinate amount of time looking at this patch.  I'm not
 sure your proposed change will accomplish the desired effect.  The
 thing is that the code you quote is within an if-block gated by
 (caughtup  !streamingDoneSending) || pq_is_send_pending().
 Currently, therefore, the behavior is that we wait on the latch
 *either when* we're caught up (and thus need to wait for more WAL) or
 when the outbound queue is full (and thus we need to wait for it to
 drain), but we send a keep-alive only in the former case (namely,
 we're caught up).

 Your proposed change would cause us to send keep-alives when we're
 caught up, or when we're not caught up but the write queue happens to
 be full.  That doesn't make much sense.  There might be a reason to
 start sending keep-alives when we're not caught up, but even if we
 want that behavior change there's no reason to do it only when the
 write queue is full.

I am not sure I can follow. Why doesn't it make sense to send out the
keepalive (with replyRequested = true) when we're busy sending stuff
(which will be the case most of the time on a busy server)?

The point is that clients like pg_receivexlog and pg_basebackup don't
send an keepalive response all the time they receive something (which is
perfectly fine), but just when their internal timer says it's time to do
so, or if the walsender asks them to do so. So, if the server has a
*lower* timeout than configured for pg_receivexlog and the server is a
busy one, you'll get timeouts relatively often. This is a pretty
frequent situation in synchronous replication setups because the default
timeout is *way* too high for that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-25 Thread Robert Haas
On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote:
 In WalSndLoop() we do:

 wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
 WL_SOCKET_READABLE;

 if (pq_is_send_pending())
 wakeEvents |= WL_SOCKET_WRITEABLE;
 else if (wal_sender_timeout  0  !ping_sent)
 {
 ...
 if (GetCurrentTimestamp() = timeout)
 WalSndKeepalive(true);
 ...

 I think those two if's should simply be separate. There's no reason not
 to ask for a ping when we're writing. On a busy server that might be the
 case most of the time.
 The if (pq_is_send_pending()) should also be *after* sending the
 keepalive, after all, it might not immediately have been flushed as
 unlikely as that is.ackers

I spent an inordinate amount of time looking at this patch.  I'm not
sure your proposed change will accomplish the desired effect.  The
thing is that the code you quote is within an if-block gated by
(caughtup  !streamingDoneSending) || pq_is_send_pending().
Currently, therefore, the behavior is that we wait on the latch
*either when* we're caught up (and thus need to wait for more WAL) or
when the outbound queue is full (and thus we need to wait for it to
drain), but we send a keep-alive only in the former case (namely,
we're caught up).

Your proposed change would cause us to send keep-alives when we're
caught up, or when we're not caught up but the write queue happens to
be full.  That doesn't make much sense.  There might be a reason to
start sending keep-alives when we're not caught up, but even if we
want that behavior change there's no reason to do it only when the
write queue is full.

At least, that's how it looks to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Andres Freund
On 2014-02-25 10:29:32 -0500, Peter Eisentraut wrote:
 On 1/30/14, 2:28 AM, Christian Kruse wrote:
  after I finally got documentation compilation working I updated the
  patch to be syntactically correct. You will find it attached.
 
 I don't think we should be explaining the basics of OS memory management
 in our documentation.

Agreed.

 And if we did, we shouldn't copy it verbatim from the Debian wiki
 without attribution.

Is it actually? A quick comparison doesn't show that many similarities?
Christian?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Christian Kruse
Hi,

On 25/02/14 10:29, Peter Eisentraut wrote:
 I don't think we should be explaining the basics of OS memory management
 in our documentation.

Well, I'm confused. I thought that's exactly what has been asked.

 And if we did, we shouldn't copy it verbatim from the Debian wiki
 without attribution.

I didn't. This is a write-up of several articles, blog posts and
documentation I read about this topic.

However, if you think the texts are too similar, then we should add a
note, yes. Didn't mean to copy w/o referring to a source.

 I think this patch should be cut down to the paragraphs that cover the
 actual configuration.

I tried to cover the issues Heikki brought up in
52861eec.2090...@vmware.com.

 On a technical note, use xref instead of link for linking.
 doc/src/sgml/README.links contains some information.

OK, I will post an updated patch later this evening.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpfzdmiy8T7D.pgp
Description: PGP signature


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Christian Kruse
Hi,

On 25/02/14 17:01, Andres Freund wrote:
  And if we did, we shouldn't copy it verbatim from the Debian wiki
  without attribution.
 
 Is it actually? A quick comparison doesn't show that many similarities?
 Christian?

Not as far as I know. But of course, as I wrote the text I _also_
(that's not my only source) read the Debian article and I was
influenced by it. It may be that the texts are more similar then I
thought, although I still don't see it.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgplDAVez7E7u.pgp
Description: PGP signature


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 12:31 PM, Josh Berkus j...@agliodbs.com wrote:
 Aside from that, I want our docs to make a strong endorsement of using
 jsonb over json for most users.  jsonb will continue to be developed and
 improved in the future; it is very unlikely that json will.  Maybe
 that's what I should say rather than anything about efficiency.

 In other words: having an ambiguous, complex evaluation of json vs.
 jsonb does NOT benefit most users.  The result will be some users
 choosing json and then pitching fit when they want jsonb in 9.5 and have
 to rewrite all their tables.

 Mind you, we'll need to fix the slow deserialization, though.

I think you've got your head stuck deeply in the sand.  The json data
type works exactly like the xml data type has always worked.  There
have been occasional noises about making an xmlb data type, but
nobody's minded enough to do anything about it, or at least not in
this forum.  So if the json data type has no future and is crap, then
the same presumably holds of the xml data type.  But I don't think
anyone here believes that, unless they just hate xml on general
principle, which I can certainly understand.

You really *can't* fix the fact that jsonb takes longer to
(deserialize than json.  I mean, it's possible the code can be
optimized.  But since json is stored in the exact format in which it
is to be emitted, the output function is basically just memcpy().
You're never going to get that kind of speed out of code that actually
has to do something, and I suspect you're going to find that it's hard
to come close.

In short, I think you're viewing everything about jsonb with
rose-colored glasses on, and that your enthusiasm is mostly wishful
thinking.  Will there be good things about jsonb?  Of course.  Will
lots of people want to use it for those reasons?  Very likely.  Will
it be better than json in all ways and for all purposes?  No, and
implying the contrary is just plain wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 I am not sure I can follow. Why doesn't it make sense to send out the
 keepalive (with replyRequested = true) when we're busy sending stuff
 (which will be the case most of the time on a busy server)?

It may very well make sense, but your patch won't generally have that
effect, because with the patch you proposed, the keep-alive can only
be sent when the server is busy if the write queue is also full.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-25 Thread Andres Freund
On 2014-02-25 11:15:46 -0500, Robert Haas wrote:
 On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I am not sure I can follow. Why doesn't it make sense to send out the
  keepalive (with replyRequested = true) when we're busy sending stuff
  (which will be the case most of the time on a busy server)?
 
 It may very well make sense, but your patch won't generally have that
 effect, because with the patch you proposed, the keep-alive can only
 be sent when the server is busy if the write queue is also full.

Well, it either needs to be caughtup *or*/and have a busy write queue,
right? Usually that state will be reached very quickly because before
that we're writing data to the network as fast as it can be read from
disk.
Also, there's no timeout checks outside that if (caughtup ||
send_pending()) block, so there's not much of a window to hit problems when
that loop isn't entered.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-25 11:15:46 -0500, Robert Haas wrote:
 On Tue, Feb 25, 2014 at 10:54 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I am not sure I can follow. Why doesn't it make sense to send out the
  keepalive (with replyRequested = true) when we're busy sending stuff
  (which will be the case most of the time on a busy server)?

 It may very well make sense, but your patch won't generally have that
 effect, because with the patch you proposed, the keep-alive can only
 be sent when the server is busy if the write queue is also full.

 Well, it either needs to be caughtup *or*/and have a busy write queue,
 right?

Right.

 Usually that state will be reached very quickly because before
 that we're writing data to the network as fast as it can be read from
 disk.

I'm unimpressed.  Even if that is in practice true, making the code
self-consistent is a goal of non-trivial value.  The timing of sending
keep-alives has no business depending on the state of the write queue,
and right now it doesn't.  Your patch would make it depend on that,
mostly by accident AFAICS.

 Also, there's no timeout checks outside that if (caughtup ||
 send_pending()) block, so there's not much of a window to hit problems when
 that loop isn't entered.

That might be true, but waiting until the write queue is full to send
a ping and then checking whether we've timed out before the queue has
drained isn't a sane design pattern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 1:05 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Feb 17, 2014 at 11:14:33AM -0500, Bruce Momjian wrote:
 On Sun, Feb 16, 2014 at 09:26:47PM -0500, Robert Haas wrote:
   So, would anyone like me to create patches for any of these items before
   we hit 9.4 beta?  We have added autovacuum_work_mem, and increasing
   work_mem and maintenance_work_mem by 4x is a simple operation.  Not sure
   about the others.  Or do we just keep this all for 9.5?
 
  I don't think anyone objected to increasing the defaults for work_mem
  and maintenance_work_mem by 4x, and a number of people were in favor,
  so I think we should go ahead and do that.  If you'd like to do the
  honors, by all means!

 OK, patch attached.

 Patch applied.

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

2014-02-25 Thread Robert Haas
On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski rukh.me...@yahoo.ca wrote:
 Sorry, I wanted to minimize the attention my message attracts.  I mostly 
 posted it to let people know I plan on working on this for 9.5 to avoid 
 duplicated effort.  I will post more documentation and my reasons for wanting 
 this feature in postgre later, if that's all right.

I've wanted this more than once.  I suspect it's a pretty hard project, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] varchar_transform

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 2:11 AM, Mohsen SM mohsensoodk...@gmail.com wrote:
 when did use varchar_transform function?
 src/backend/uitls/adt/varchar.c.

The main point of that machinery is that if you do something like
ALTER TABLE zot ALTER COLUMN thunk SET DATA TYPE varchar(30) on a
column that is currently varchar(20), it won't feel obliged to rewrite
the table.  That function lets it figure out that anything that's a
valid varchar(20) value is also a valid varchar(30) value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Peter Eisentraut
On 2/25/14, 11:08 AM, Christian Kruse wrote:
 Hi,
 
 On 25/02/14 17:01, Andres Freund wrote:
 And if we did, we shouldn't copy it verbatim from the Debian wiki
 without attribution.

 Is it actually? A quick comparison doesn't show that many similarities?
 Christian?
 
 Not as far as I know. But of course, as I wrote the text I _also_
 (that's not my only source) read the Debian article and I was
 influenced by it. It may be that the texts are more similar then I
 thought, although I still don't see it.

I suspect that it was done subconsciously.  But I did notice it right
away, so there is something to it.

As I mentioned, I would just cut those introductory parts out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 2:49 PM, Christian Kruse
christ...@2ndquadrant.com wrote:
 attached you will find a fixed version.

Committed, after fixing the regression test outputs.  I also changed
the new fields to be NULL rather than 0 when they are invalid, because
that way applying age() to that column does something useful instead
of something lame.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Josh Berkus
On 02/25/2014 08:13 AM, Robert Haas wrote:
 I think you've got your head stuck deeply in the sand.  The json data
 type works exactly like the xml data type has always worked.  There
 have been occasional noises about making an xmlb data type, but
 nobody's minded enough to do anything about it, or at least not in
 this forum.  So if the json data type has no future and is crap, then
 the same presumably holds of the xml data type.  But I don't think
 anyone here believes that, unless they just hate xml on general
 principle, which I can certainly understand.

Well, if we had an XMLB, I would in fact be making the same argument.
I'll point out the only reason we're keeping the original json instead
of forcing an upgrade to jsonb, per earlier discussions, is
backwards-compatibility.  If we had never had a json-text, and Merlin
was proposing adding one now alongside jsonb, I'd be arguing against
doing so.

 In short, I think you're viewing everything about jsonb with
 rose-colored glasses on, and that your enthusiasm is mostly wishful
 thinking.  Will there be good things about jsonb?  Of course.  Will
 lots of people want to use it for those reasons?  Very likely.  Will
 it be better than json in all ways and for all purposes?  No, and
 implying the contrary is just plain wrong.

It hurts our adoption substantially to confuse developers.  We need to
recommend one type over the other, hence Use jsonb unless you need X.
 Merlin is pushing the type of multivariable comparison where *I*
wouldn't be able to make sense of which one I should pick, let alone
some web developer who's just trying to get a site built.  That sort of
thing *really* doesn't help our users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Bruce Momjian
On Tue, Feb 25, 2014 at 09:38:50AM -0800, Josh Berkus wrote:
  In short, I think you're viewing everything about jsonb with
  rose-colored glasses on, and that your enthusiasm is mostly wishful
  thinking.  Will there be good things about jsonb?  Of course.  Will
  lots of people want to use it for those reasons?  Very likely.  Will
  it be better than json in all ways and for all purposes?  No, and
  implying the contrary is just plain wrong.
 
 It hurts our adoption substantially to confuse developers.  We need to
 recommend one type over the other, hence Use jsonb unless you need X.
  Merlin is pushing the type of multivariable comparison where *I*
 wouldn't be able to make sense of which one I should pick, let alone
 some web developer who's just trying to get a site built.  That sort of
 thing *really* doesn't help our users.

I agree it would be nice to have something simple, like Use JSON if you
wish to just store/retrieve entire JSON structures, and JSONB if you
wish to do any kind of lookup or manipulation of JSON values on the
server.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Bruce Momjian
On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
 On 2/25/14, 11:08 AM, Christian Kruse wrote:
  Hi,
  
  On 25/02/14 17:01, Andres Freund wrote:
  And if we did, we shouldn't copy it verbatim from the Debian wiki
  without attribution.
 
  Is it actually? A quick comparison doesn't show that many similarities?
  Christian?
  
  Not as far as I know. But of course, as I wrote the text I _also_
  (that's not my only source) read the Debian article and I was
  influenced by it. It may be that the texts are more similar then I
  thought, although I still don't see it.
 
 I suspect that it was done subconsciously.  But I did notice it right
 away, so there is something to it.
 
 As I mentioned, I would just cut those introductory parts out.

Should we link to the Debian wiki content?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Avoiding deeply nested AND/OR trees in the parser

2014-02-25 Thread Tom Lane
Over in
http://www.postgresql.org/message-id/bay176-w382a9de827ebc8e602b7bbc5...@phx.gbl
there's a complaint about getting stack depth limit exceeded from a
query of the form

select count(*) from gui_die_summary where (x_coord, y_coord) in
((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...);

when there's a few thousand pairs in the IN list.  The reason for this
is that transformAExprIn() generates a tree of nested OR expressions,
and then recursion in assign_collations_walker() runs out of stack space.
(If assign_collations_walker() hadn't failed, something else probably
would later, so it's wrong to blame that function in particular.)

I reproduced this problem in the regression database using generated
queries like so:

print explain select * from tenk1 where (thousand, tenthous) in (\n;
for ($i = 0; $i  1; $i++) {
  print ,\n if $i  0;
  print ($i,$i);
}
print );\n;

On my machine, HEAD fails at about 9000 pairs with this test case.

While I commented in the pgsql-novice thread that there are better ways
to do this, it still seems like a limitation we probably ought to fix
if it's not too hard.  In the case of transformAExprIn, we could generate
an N-argument OR or AND node instead of a nest; this is already done
for example in make_row_comparison_op().  The attached quick-hack patch
does so, and I verified that the system could handle a million pairs
with this in place.  (It takes about 20 seconds and 20GB of memory to
plan such a case, so I still say it's a bad idea, but at least we can
do it.)  There is similar code in make_row_distinct_op(), which perhaps
ought to be fixed as well.

However, this isn't exactly the end of the story, because if you were to
dump out a view generated from a query like this, it would contain a long
chain of OR clauses, which would mean that reloading the view would put
you right back in stack overflow territory.  Really if we wanted to fix
this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
they recognized nested ANDs/ORs and flattened them on the spot.  This
might not be a bad idea, but it's starting to look like more than a quick
hack patch.

Does this seem worth pursuing, or shall we leave it alone?

regards, tom lane

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 81c9338..9550bd1 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
*** transformAExprOf(ParseState *pstate, A_E
*** 1088,1094 
  static Node *
  transformAExprIn(ParseState *pstate, A_Expr *a)
  {
! 	Node	   *result = NULL;
  	Node	   *lexpr;
  	List	   *rexprs;
  	List	   *rvars;
--- 1088,1095 
  static Node *
  transformAExprIn(ParseState *pstate, A_Expr *a)
  {
! 	Node	   *result;
! 	List	   *cmpexprs = NIL;
  	Node	   *lexpr;
  	List	   *rexprs;
  	List	   *rvars;
*** transformAExprIn(ParseState *pstate, A_E
*** 1166,1171 
--- 1167,1173 
  			 */
  			List	   *aexprs;
  			ArrayExpr  *newa;
+ 			Node	   *cmp;
  
  			aexprs = NIL;
  			foreach(l, rnonvars)
*** transformAExprIn(ParseState *pstate, A_E
*** 1185,1196 
  			newa-multidims = false;
  			newa-location = -1;
  
! 			result = (Node *) make_scalar_array_op(pstate,
!    a-name,
!    useOr,
!    lexpr,
!    (Node *) newa,
!    a-location);
  
  			/* Consider only the Vars (if any) in the loop below */
  			rexprs = rvars;
--- 1187,1202 
  			newa-multidims = false;
  			newa-location = -1;
  
! 			cmp = (Node *) make_scalar_array_op(pstate,
! a-name,
! useOr,
! lexpr,
! (Node *) newa,
! a-location);
! 
! 			/* cmp certainly yields boolean, no need to check it */
! 
! 			cmpexprs = lappend(cmpexprs, cmp);
  
  			/* Consider only the Vars (if any) in the loop below */
  			rexprs = rvars;
*** transformAExprIn(ParseState *pstate, A_E
*** 1198,1204 
  	}
  
  	/*
! 	 * Must do it the hard way, ie, with a boolean expression tree.
  	 */
  	foreach(l, rexprs)
  	{
--- 1204,1210 
  	}
  
  	/*
! 	 * Any remaining righthand exprs need to be compared individually.
  	 */
  	foreach(l, rexprs)
  	{
*** transformAExprIn(ParseState *pstate, A_E
*** 1226,1239 
  		}
  
  		cmp = coerce_to_boolean(pstate, cmp, IN);
! 		if (result == NULL)
! 			result = cmp;
! 		else
! 			result = (Node *) makeBoolExpr(useOr ? OR_EXPR : AND_EXPR,
! 		   list_make2(result, cmp),
! 		   a-location);
  	}
  
  	return result;
  }
  
--- 1232,1253 
  		}
  
  		cmp = coerce_to_boolean(pstate, cmp, IN);
! 
! 		cmpexprs = lappend(cmpexprs, cmp);
  	}
  
+ 	/*
+ 	 * If we have more than one comparison expression, AND or OR them together
+ 	 */
+ 	if (cmpexprs == NIL)
+ 		result = NULL;			/* can this happen? is it right if so? */
+ 	else if (list_length(cmpexprs) == 1)
+ 		result = (Node *) linitial(cmpexprs);
+ 	else

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
 As I mentioned, I would just cut those introductory parts out.

 Should we link to the Debian wiki content?

-1.  We generally don't link to our *own* wiki in our SGML docs, let alone
things that aren't even under our project's control.  Moreover, Debian
is not going to be explaining these things in a way that accounts for
non-Linux operating systems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-25 Thread Andres Freund
On 2014-02-25 13:21:46 -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Feb 25, 2014 at 12:18:02PM -0500, Peter Eisentraut wrote:
  As I mentioned, I would just cut those introductory parts out.
 
  Should we link to the Debian wiki content?
 
 -1.  We generally don't link to our *own* wiki in our SGML docs, let alone
 things that aren't even under our project's control.

Agreed. Especially as the interesting bit is the postgres specific
logic, not the rest.

I think all that's needed is to cut the first paragraphs that generally
explain what huge pages are in some detail from the text and make sure
the later paragraphs don't refer to the earlier ones.

 Moreover, Debian
 is not going to be explaining these things in a way that accounts for
 non-Linux operating systems.

It's a linux only feature so far, so that alone wouldn't be a problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 12:38 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 08:13 AM, Robert Haas wrote:
 I think you've got your head stuck deeply in the sand.  The json data
 type works exactly like the xml data type has always worked.  There
 have been occasional noises about making an xmlb data type, but
 nobody's minded enough to do anything about it, or at least not in
 this forum.  So if the json data type has no future and is crap, then
 the same presumably holds of the xml data type.  But I don't think
 anyone here believes that, unless they just hate xml on general
 principle, which I can certainly understand.

 Well, if we had an XMLB, I would in fact be making the same argument.
 I'll point out the only reason we're keeping the original json instead
 of forcing an upgrade to jsonb, per earlier discussions, is
 backwards-compatibility.  If we had never had a json-text, and Merlin
 was proposing adding one now alongside jsonb, I'd be arguing against
 doing so.

You can argue that all you like.  But the same argument was made and
rejected at the time we (I) added the original json type.  So I don't
believe that you can claim that your argument is backed by any sort of
consensus, because AFAICS it isn't.

 In short, I think you're viewing everything about jsonb with
 rose-colored glasses on, and that your enthusiasm is mostly wishful
 thinking.  Will there be good things about jsonb?  Of course.  Will
 lots of people want to use it for those reasons?  Very likely.  Will
 it be better than json in all ways and for all purposes?  No, and
 implying the contrary is just plain wrong.

 It hurts our adoption substantially to confuse developers.  We need to
 recommend one type over the other, hence Use jsonb unless you need X.
  Merlin is pushing the type of multivariable comparison where *I*
 wouldn't be able to make sense of which one I should pick, let alone
 some web developer who's just trying to get a site built.  That sort of
 thing *really* doesn't help our users.

I don't have any objection to editing what Merlin wrote to be clear
and concise; I don't think he meant for it to be considered for
inclusion in the documentation in exactly that form anyway.  I do have
an objection to including your unjustified partisanship in our
documentation as fact.

The reality is that if you have a bunch of JSON documents indexed by
some ID number and expect to usually retrieve the whole document, you
probably don't want jsonb.  You probably want one integer column and
one json column, because it's gonna be faster that way.  And if you
expect to usually retrieve only part of the document, then you are
probably better off using separate columns for the separate parts of
the document, because I bet that extracting a portion of a large
document is still going to require de-TOASTing the whole thing, or at
least all the data preceding the last byte offset of interest, which
is full of lose.  The situation where jsonb is going to win is where
either (1) you or your client are so stuck in the document database
model that you can't fathom the idea that using a real database schema
might improve performance or (2) you have so many different things
(pseudocolumns, as it were) that you might want to extract from any
given JSON blob that it's impractical to use real columns for all of
those.  I agree those are both real use cases.  I do not agree that
they are the only or most common use cases.  And I definitely don't
agree that our documentation should push people towards stuffing
everything in a JSON blob instead of using real column definitions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor comment improvements in tablecmds.c

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 9:40 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 This is a small patch to improve comments in tablecmds.c.  Please find
 attached a patch.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor comment improvements in tablecmds.c

2014-02-25 Thread Alvaro Herrera
Etsuro Fujita wrote:
 This is a small patch to improve comments in tablecmds.c.  Please find
 attached a patch.

I find both patched and unpatched to be pretty illegible.  How about
something like

  /*
 - * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE RENAME
 + * Execute ALTER relation type RENAME
 + *   This routine supports tables, indexes, sequences, views,
 + *   and foreign tables
   */

and the two other ones with something like this:

   /*
 -  * Grab an exclusive lock on the target table, index, sequence or view,
 -  * which we will NOT release until end of transaction.
 +  * Grab an exclusive lock on the target relation,
 +  * which we will NOT release until
 +  * end of transaction.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Josh Berkus
On 02/25/2014 10:31 AM, Robert Haas wrote:
 And I definitely don't
 agree that our documentation should push people towards stuffing
 everything in a JSON blob instead of using real column definitions.



Where did you get this out of my doc patch?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.7

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 6:16 PM, Andres Freund and...@2ndquadrant.com wrote:
 I actually thought they'd be too ugly to live and we'd remove them
 pre-commit.

Might be getting to be about that time, then.

 -   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval ||
 forceSyncCommit)
 +   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval ||
 forceSyncCommit ||
 +   XLogLogicalInfoActive())

 Mmph.  Is this really necessary?  If so, why?  The comments could elucidate.

 We could get rid of it by (optionally) adding information about the
 database oid to compact commit, but that'd increase the size of the
 record.

So why do we need the database OID?

 +   bool fail_softly = slot-data.persistency == RS_EPHEMERAL;

 This should be contingent on whether we're being called in the error
 pathway, not the slot type.  I think you should pass a bool.

 Why? I had it that way at first, but for persistent slots this won't be
 called in error pathways as we won't drop there.

I was thinking more the reverse - that a non-persistent slot might be
dropped in a non-error pathway.

  It seems to be indicating, roughly, whether the relation should
 participate in RecentGlobalXmin or RecentGlobalDataXmin.  But is there
 any point at all of separating those when !XLogLogicalInfoActive()?
 The test expands to:

 IsSystemRelation() || (XLogLogicalInfoActive() 
 RelationNeedsWAL(relation)  (IsCatalogRelation(relation) ||
 RelationIsUsedAsCatalogTable(relation)))

 So basically this is all tables created in pg_catalog during initdb
 plus all TOAST tables in the system.  If wal_level=logical, then we
 also include tables marked with the reloption user_catalog_table=true,
 unless they're unlogged.  This all seems a bit complex.  Why not this:

 IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation)

 Because that'd possibly retain too much when !XLogLogicalInfoActive(),
 there's no need to look for RelationIsUsedAsCatalogTable() for those. We
 could decide not to care?

But when !XLogLogicalInfoActive() I think we could just make this
always false, right?  I mean, if PROC_IN_LOGICAL_DECODING is never
going to be set,  the values are always going to be the same anyway.
I think.

 +   /* 
 +* This is a bit tricky: We need to determine a safe xmin
 horizon to start
 +* decoding from, to avoid starting from a running xacts
 record referring
 +* to xids whose rows have been vacuumed or pruned
 +* already. GetOldestSafeDecodingTransactionId() returns such
 a value, but
 +* without further interlock it's return value might
 immediately be out of
 +* date.
 +*
 +* So we have to acquire the ProcArrayLock to prevent computation of 
 new
 +* xmin horizons by other backends, get the safe decoding xid,
 and inform
 +* the slot machinery about the new limit. Once that's done the
 +* ProcArrayLock can be be released as the slot machinery now is
 +* protecting against vacuum.
 +* 
 +*/

 I can't claim to be very excited about this.

 Because of the already_locked parameters, or any wider concerns?

Passing down already_locked through several layers is kind of ugly,
but also, holding ProcArrayLock more is sad.  That is not a
lightly-contended lock.

 I'm assuming you've
 spent a lot of time thinking about ways to avoid this and utterly
 failed to come up with any reasonable alternative, but let me take a
 shot.  Suppose we take ProcArrayLock in exclusive mode and compute the
 oldest running XID, install it as our xmin, and then release
 ProcArrayLock.  At that point, nobody else can compute an oldest-xmin
 value that precedes that value, so we can take our time installing
 that value as the slot's xmin, without needing to hold a lock
 meanwhile.

 I actually had it that way for a while, but what if the backend already
 has a xmin set? Then we need to reason about whether the xmin is newer,
 restore it afterwards and such. That doesn't seem nice.

It's not too far removed from the problem snapmgr.c is already
designed to solve, though, is it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 10:31 AM, Robert Haas wrote:
 And I definitely don't
 agree that our documentation should push people towards stuffing
 everything in a JSON blob instead of using real column definitions.

 

 Where did you get this out of my doc patch?

Way to quote what I said out of context.

But to make a long story short, I get that from the fact that you want
to railroad everyone into using jsonb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor comment improvements in tablecmds.c

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 1:44 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Etsuro Fujita wrote:
 This is a small patch to improve comments in tablecmds.c.  Please find
 attached a patch.

 I find both patched and unpatched to be pretty illegible.  How about
 something like

  /*
 - * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE RENAME
 + * Execute ALTER relation type RENAME
 + *   This routine supports tables, indexes, sequences, views,
 + *   and foreign tables
   */

I don't find that to be an improvement.

 and the two other ones with something like this:

   /*
 -  * Grab an exclusive lock on the target table, index, sequence or view,
 -  * which we will NOT release until end of transaction.
 +  * Grab an exclusive lock on the target relation,
 +  * which we will NOT release until
 +  * end of transaction.

But I do like that better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Josh Berkus
On 02/25/2014 10:50 AM, Robert Haas wrote:
 On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 10:31 AM, Robert Haas wrote:
 And I definitely don't
 agree that our documentation should push people towards stuffing
 everything in a JSON blob instead of using real column definitions.

 

 Where did you get this out of my doc patch?
 
 Way to quote what I said out of context.

Way to put words in my mouth.

 But to make a long story short, I get that from the fact that you want
 to railroad everyone into using jsonb.

That's called a straw man argument, Robert.

Me: We should recommend that people use jsonb unless they have a
specific reason for using json.

Merlin: We should present them side-by-side with a complex comparison.

Robert: Josh wants to junk all relational data and use only jsonb!

I mean, really, WTF?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Josh Berkus
On 02/25/2014 09:45 AM, Bruce Momjian wrote:
 It hurts our adoption substantially to confuse developers.  We need to
 recommend one type over the other, hence Use jsonb unless you need X.
  Merlin is pushing the type of multivariable comparison where *I*
 wouldn't be able to make sense of which one I should pick, let alone
 some web developer who's just trying to get a site built.  That sort of
 thing *really* doesn't help our users.
 
 I agree it would be nice to have something simple, like Use JSON if you
 wish to just store/retrieve entire JSON structures, and JSONB if you
 wish to do any kind of lookup or manipulation of JSON values on the
 server.

(to clarify below: json refers to the current varlena datatype; JSON
refers to JSON serialized data).

I don't think that's decisive enough, which is why I wrote the doc the
way I did.  The problem is that most users would prefer that we tell
them which one to use, which is why I want to structure the doc as Use
jsonb unless you need one of these things, or more specifically:

In general, most applications will find it advantageous to store
JSON data
as typejsonb/type, as jsonb is more efficient when using JSON
manipulation functions, and will
support future advanced json index, operator and search features. The
typejson/type will primarily be useful for applications which
need to
preserve exact formatting of the input JSON, or users with existing
typejson/type columns which they do not want to convert to
typejsonb/type.

Part of my reason for wanting to recommend jsonb over json is in the
context of the third storage option for JSON, namely TEXT.  The only
things which distinguish json from TEXT for JSON storage are validation
and a set of json manipulation functions.  jsonb works with the
manipulation functions better/faster, causing the old json type to start
looking like more of a DOMAIN over TEXT than a real type comparatively.
 In other words, if you ask the question Why would I want to use json
instead of either jsonb or TEXT, the answer becomes quite narrow.

Possibly I should expand the little chart and add a column for TEXT?
It's a viable option for storing JSON data, especially if you store a
lot of broken JSON or fragments.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Adrian Klaver

On 02/25/2014 10:54 AM, Josh Berkus wrote:

On 02/25/2014 10:50 AM, Robert Haas wrote:

On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote:

On 02/25/2014 10:31 AM, Robert Haas wrote:

And I definitely don't
agree that our documentation should push people towards stuffing
everything in a JSON blob instead of using real column definitions.




Where did you get this out of my doc patch?


Way to quote what I said out of context.


Way to put words in my mouth.


But to make a long story short, I get that from the fact that you want
to railroad everyone into using jsonb.


That's called a straw man argument, Robert.

Me: We should recommend that people use jsonb unless they have a
specific reason for using json.

Merlin: We should present them side-by-side with a complex comparison.


From the cheap seats.

To me the whole hstore/json/jsonb family is a WIP and any enlightenment 
in the form of comparisons would be greatly appreciated by me and other 
end users I would suspect.




Robert: Josh wants to junk all relational data and use only jsonb!

I mean, really, WTF?


Seems to be a hot topic all the way around. I am neck deep in learning 
Web development and am coming to grips with the role of JSON in that world.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.7

2014-02-25 Thread Andres Freund
Hi,

On 2014-02-25 13:47:49 -0500, Robert Haas wrote:
 On Mon, Feb 24, 2014 at 6:16 PM, Andres Freund and...@2ndquadrant.com wrote:
  I actually thought they'd be too ugly to live and we'd remove them
  pre-commit.
 
 Might be getting to be about that time, then.

I want to leave them in until the slot semantics aren't going to change
anymore, they are pretty useful for testing that. But I'll separate them out
into a separate commit again.

  -   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval ||
  forceSyncCommit)
  +   if (nrels  0 || nmsgs  0 || RelcacheInitFileInval ||
  forceSyncCommit ||
  +   XLogLogicalInfoActive())
 
  Mmph.  Is this really necessary?  If so, why?  The comments could 
  elucidate.
 
  We could get rid of it by (optionally) adding information about the
  database oid to compact commit, but that'd increase the size of the
  record.
 
 So why do we need the database OID?

To ignore commits from other databases. Since we don't decode changes
from other databases, it's really confusing (and pointless overhead) to
see transactions from there.

  +   bool fail_softly = slot-data.persistency == RS_EPHEMERAL;
 
  This should be contingent on whether we're being called in the error
  pathway, not the slot type.  I think you should pass a bool.
 
  Why? I had it that way at first, but for persistent slots this won't be
  called in error pathways as we won't drop there.
 
 I was thinking more the reverse - that a non-persistent slot might be
 dropped in a non-error pathway.

Well, currently EPHEMERAL slots are documented to be dropped at release
since that's what changeset extraction (and possibly basebackup and
receivexlog) need afaics. You'd prefer DROP_ON_ERROR semantics?

   It seems to be indicating, roughly, whether the relation should
  participate in RecentGlobalXmin or RecentGlobalDataXmin.  But is there
  any point at all of separating those when !XLogLogicalInfoActive()?
  The test expands to:
 
  IsSystemRelation() || (XLogLogicalInfoActive() 
  RelationNeedsWAL(relation)  (IsCatalogRelation(relation) ||
  RelationIsUsedAsCatalogTable(relation)))
 
  So basically this is all tables created in pg_catalog during initdb
  plus all TOAST tables in the system.  If wal_level=logical, then we
  also include tables marked with the reloption user_catalog_table=true,
  unless they're unlogged.  This all seems a bit complex.  Why not this:
 
  IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation)
 
  Because that'd possibly retain too much when !XLogLogicalInfoActive(),
  there's no need to look for RelationIsUsedAsCatalogTable() for those. We
  could decide not to care?
 
 But when !XLogLogicalInfoActive() I think we could just make this
 always false, right?  I mean, if PROC_IN_LOGICAL_DECODING is never
 going to be set,  the values are always going to be the same anyway.
 I think.

It seems confusing and bug-prone to use the wrong horizon variable just
because right now they'd be the same if wal_level  logical.

  I can't claim to be very excited about this.
 
  Because of the already_locked parameters, or any wider concerns?
 
 Passing down already_locked through several layers is kind of ugly,
 but also, holding ProcArrayLock more is sad.  That is not a
 lightly-contended lock.

Absolutely true, but this is very far from a operation that will be
frequent enough to matter. Creating a slot so frequently that a lock on
the procarray hold while iterating the slot array matters, will be
painful long before the contention on that is the problem.

  I'm assuming you've
  spent a lot of time thinking about ways to avoid this and utterly
  failed to come up with any reasonable alternative, but let me take a
  shot.  Suppose we take ProcArrayLock in exclusive mode and compute the
  oldest running XID, install it as our xmin, and then release
  ProcArrayLock.  At that point, nobody else can compute an oldest-xmin
  value that precedes that value, so we can take our time installing
  that value as the slot's xmin, without needing to hold a lock
  meanwhile.
 
  I actually had it that way for a while, but what if the backend already
  has a xmin set? Then we need to reason about whether the xmin is newer,
  restore it afterwards and such. That doesn't seem nice.
 
 It's not too far removed from the problem snapmgr.c is already
 designed to solve, though, is it?

Hm, I don't immediately see how it would fit in there. PgXact-xmin is
set by procarray.c, all snapmgr does is reset it. And there's no logic
about resetting it back to higher values and such.

I'll ponder on getting rid of this, but I am not of too high hopes.

Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unfortunate choice of short switch name in pgbench

2014-02-25 Thread Tom Lane
I just wasted some time puzzling over strange results from pgbench.
I eventually realized that I'd been testing against the wrong server,
because rather than -p 65432 I'd typed -P 65432, thereby invoking
the recently added --progress option.  pgbench has no way to know that
that isn't what I meant; the fact that both switches take integer
arguments doesn't help.

To fix this, I propose removing the -P short form and only allowing the
long --progress form.  I won't argue that this feature is completely
useless, but for sure it's not something I'd want more often than once
in a blue moon.  So I think it does not need to have a short form; and
for sure it doesn't need a short form that's so easily confused with a
commonly used switch.

If no objections, I'll go make that change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-25 Thread Jeremy Harris

On 24/02/14 17:38, Robert Haas wrote:

On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote:

Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap.  The existing
code takes none at all.


Can you explain this further?  This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.


In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array.  A copy is taken of the SortTuple being
operated on for the inner loop to use.  This copy suffers cache misses.

(The inner loop operates on elements between the copy source and the
end of the array).


In the original, the outer loop runs the array in increasing index
order.  Again a copy is taken of the SortTuple for the inner loop
to use.  This copy does not appear to take significant cache misses.

(The inner loop operates on elements between the copy source and
the start of the array).

--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unfortunate choice of short switch name in pgbench

2014-02-25 Thread Pavel Stehule
2014-02-25 20:49 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 I just wasted some time puzzling over strange results from pgbench.
 I eventually realized that I'd been testing against the wrong server,
 because rather than -p 65432 I'd typed -P 65432, thereby invoking
 the recently added --progress option.  pgbench has no way to know that
 that isn't what I meant; the fact that both switches take integer
 arguments doesn't help.

 To fix this, I propose removing the -P short form and only allowing the
 long --progress form.  I won't argue that this feature is completely
 useless, but for sure it's not something I'd want more often than once
 in a blue moon.  So I think it does not need to have a short form; and
 for sure it doesn't need a short form that's so easily confused with a
 commonly used switch.

 If no objections, I'll go make that change.


+1

Pavel



 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Unfortunate choice of short switch name in pgbench

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 2:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I just wasted some time puzzling over strange results from pgbench.
 I eventually realized that I'd been testing against the wrong server,
 because rather than -p 65432 I'd typed -P 65432, thereby invoking
 the recently added --progress option.  pgbench has no way to know that
 that isn't what I meant; the fact that both switches take integer
 arguments doesn't help.

 To fix this, I propose removing the -P short form and only allowing the
 long --progress form.  I won't argue that this feature is completely
 useless, but for sure it's not something I'd want more often than once
 in a blue moon.  So I think it does not need to have a short form; and
 for sure it doesn't need a short form that's so easily confused with a
 commonly used switch.

 If no objections, I'll go make that change.

Hmm.  I don't have a real specific opinion on the value of this
particular --progress option, but my experience is that most
--progress options get a lot of use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 1:54 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 10:50 AM, Robert Haas wrote:
 On Tue, Feb 25, 2014 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 10:31 AM, Robert Haas wrote:
 And I definitely don't
 agree that our documentation should push people towards stuffing
 everything in a JSON blob instead of using real column definitions.

 

 Where did you get this out of my doc patch?

 Way to quote what I said out of context.

 Way to put words in my mouth.

 But to make a long story short, I get that from the fact that you want
 to railroad everyone into using jsonb.

 That's called a straw man argument, Robert.

 Me: We should recommend that people use jsonb unless they have a
 specific reason for using json.

 Merlin: We should present them side-by-side with a complex comparison.

 Robert: Josh wants to junk all relational data and use only jsonb!

 I mean, really, WTF?

OK, since what I said seems to have become distorted somewhere along
the line, allow me to rephrase:

I don't agree that jsonb should be preferred in all but a handful of
situations.  Nor do I agree that partisanship belongs in our
documentation.  Therefore, -1 for your proposal to recommend that, and
+1 for Merlin's proposal to present a comparison which fairly
illustrates the situations in which each will outperform the other.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] define type_transform to new user defined type

2014-02-25 Thread Mohsen SM
 I want to create new type that is similar to varchar.
its size is variable.
I use CREATE TYPE query.
I define for that type this functions:
1-typein
2-typeoute
3-typemodify_input
4-typemodify_output
5-type_transform
I can define 1 to 4 functions in CREATE TYPE
but I can't define type_transform for that type. how did I can define
type_transform for that type?


Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 9:34 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Yeah, erroring out seems good enough.  Particularly if you add a hint
 saying please upgrade the extension.

In past instances where this has come up, we have actually made the
.so backward-compatible.  See pg_stat_statements in particular.  I'd
prefer to keep to that precedent here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype

2014-02-25 Thread Alvaro Herrera
Robert Haas escribió:
 On Mon, Feb 24, 2014 at 9:34 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Yeah, erroring out seems good enough.  Particularly if you add a hint
  saying please upgrade the extension.
 
 In past instances where this has come up, we have actually made the
 .so backward-compatible.  See pg_stat_statements in particular.  I'd
 prefer to keep to that precedent here.

But pg_stat_statement is a user tool which is expected to have lots of
use, and backwards compatibility concerns because of people who might
have written tools on top of it.  Not so with pageinspect.  I don't
think we need to put in the same amount of effort.  (Even though,
really, it's probably not all that difficult to support both cases.  I
just don't see the point.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak

2014-02-25 Thread Alex Hunsaker
On Tue, Feb 25, 2014 at 6:56 AM, Sergey Burladyan eshkin...@gmail.com wrote:

 It looks like I found the problem, Perl use reference count and something that
 is called Mortal for memory management.  As I understand it, mortal is free
 after FREETMPS. Plperl call FREETMPS in plperl_call_perl_func() but after it,
 plperl ask perl interpreter again for new mortal SV variables, for example, in
 hek2cstr from plperl_sv_to_datum, and this new SV is newer freed.

So I think hek2cstr is the only place we leak (its the only place I
can see that allocates a mortal sv without being wrapped in
ENTER/SAVETMPS/FREETMPS/LEAVE).

Does the attached fix it for you?
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 930b9f0..3bc4034 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -304,6 +304,16 @@ static char *setlocale_perl(int category, char *locale);
 static char *
 hek2cstr(HE *he)
 {
+	char *ret;
+	SV	 *sv;
+
+	/*
+	 * HeSVKEY_force will return a temporary mortal SV*, so we need to make
+	 * sure to free it with ENTER/SAVE/FREE/LEAVE
+	 */
+	ENTER;
+	SAVETMPS;
+
 	/*-
 	 * Unfortunately,  while HeUTF8 is true for most things  256, for values
 	 * 128..255 it's not, but perl will treat them as unicode code points if
@@ -328,11 +338,17 @@ hek2cstr(HE *he)
 	 * right thing
 	 *-
 	 */
-	SV		   *sv = HeSVKEY_force(he);
 
+	sv = HeSVKEY_force(he);
 	if (HeUTF8(he))
 		SvUTF8_on(sv);
-	return sv2cstr(sv);
+	ret = sv2cstr(sv);
+
+	/* free sv */
+	FREETMPS;
+	LEAVE;
+
+	return ret;
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-25 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 Or we could wait to see if anybody reports this sort of behavior
 in a shell that won't be out of support before 9.4 gets out the
 door.

We have a field report of this happening in the sh shell in Solaris
10.  Our staff has confirmed this.  In Solaris 10 they can start
multiple clusters from a single shell, and if they later use Ctrl+C
in that shell all of those instances are shut down.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] install libpq.dll in bin directory on Windows / Cygwin

2014-02-25 Thread Peter Eisentraut
On 2/1/14, 3:22 PM, Andrew Dunstan wrote:
 In the end I went with the way I had suggested, because that's what the
 MSVC system does - it doesn't copy any other DLLs to the bin directory.
 So doing that seemed sane for backpatching, to bring the two build
 systems into sync.
 
 If you want to propose a better arrangement for the future, to include,
 say, ecpg DLLs, and including changes to the MSVC system, we can discuss
 that separately.

See attached patch.

There is also the commit fest item
https://commitfest.postgresql.org/action/patch_view?id=1330 that
requests the MSVC builds to install the epcg libraries in the bin directory.
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 029c7e9..c04e2ef 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -484,6 +484,9 @@ endif
 endif # not win32
 endif # not cygwin
 endif # not aix
+ifneq (,$(findstring $(PORTNAME),win32 cygwin))
+	$(INSTALL_SHLIB) $ '$(DESTDIR)$(bindir)/$(shlib)'
+endif
 else # no soname
 	$(INSTALL_SHLIB) $ '$(DESTDIR)$(pkglibdir)/$(shlib)'
 endif
@@ -491,7 +494,7 @@ endif
 
 installdirs-lib:
 ifdef soname
-	$(MKDIR_P) '$(DESTDIR)$(libdir)' '$(DESTDIR)$(pkgconfigdir)'
+	$(MKDIR_P) '$(DESTDIR)$(libdir)' '$(DESTDIR)$(pkgconfigdir)' $(if $(findstring $(PORTNAME),win32 cygwin),'$(DESTDIR)$(bindir)')
 else
 	$(MKDIR_P) '$(DESTDIR)$(pkglibdir)'
 endif
@@ -507,7 +510,7 @@ ifdef soname
 	rm -f '$(DESTDIR)$(libdir)/$(stlib)'
 	rm -f '$(DESTDIR)$(libdir)/$(shlib_bare)' \
 	  '$(DESTDIR)$(libdir)/$(shlib_major)' \
-	  '$(DESTDIR)$(libdir)/$(shlib)' \
+	  '$(DESTDIR)$(libdir)/$(shlib)' $(if $(findstring $(PORTNAME),win32 cygwin),'$(DESTDIR)$(bindir)/$(shlib)') \
 	  '$(DESTDIR)$(pkgconfigdir)/lib$(NAME).pc'
 else # no soname
 	rm -f '$(DESTDIR)$(pkglibdir)/$(shlib)'
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 7f2d901..2d11816 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -121,18 +121,12 @@ install: all installdirs install-lib
 	$(INSTALL_DATA) $(srcdir)/libpq-int.h '$(DESTDIR)$(includedir_internal)'
 	$(INSTALL_DATA) $(srcdir)/pqexpbuffer.h '$(DESTDIR)$(includedir_internal)'
 	$(INSTALL_DATA) $(srcdir)/pg_service.conf.sample '$(DESTDIR)$(datadir)/pg_service.conf.sample'
-ifneq (,$(findstring $(PORTNAME), win32 cygwin))
-	$(INSTALL_DATA) $(shlib) '$(DESTDIR)$(bindir)/$(shlib)'
-endif
 
 installcheck:
 	$(MAKE) -C test $@
 
 installdirs: installdirs-lib
 	$(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)'
-ifneq (,$(findstring $(PORTNAME), win32 cygwin))
-	$(MKDIR_P) '$(DESTDIR)$(bindir)'
-endif
 
 uninstall: uninstall-lib
 	rm -f '$(DESTDIR)$(includedir)/libpq-fe.h'
@@ -140,9 +134,6 @@ uninstall: uninstall-lib
 	rm -f '$(DESTDIR)$(includedir_internal)/libpq-int.h'
 	rm -f '$(DESTDIR)$(includedir_internal)/pqexpbuffer.h'
 	rm -f '$(DESTDIR)$(datadir)/pg_service.conf.sample'
-ifneq (,$(findstring $(PORTNAME), win32 cygwin))
-	rm -f '$(DESTDIR)$(bindir)/$(shlib)'
-endif
 
 clean distclean: clean-lib
 	$(MAKE) -C test $@

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] define type_transform to new user defined type

2014-02-25 Thread Tom Lane
Mohsen SM mohsensoodk...@gmail.com writes:
  I want to create new type that is similar to varchar.
 its size is variable.
 I use CREATE TYPE query.
 I define for that type this functions:
 1-typein
 2-typeoute
 3-typemodify_input
 4-typemodify_output
 5-type_transform
 I can define 1 to 4 functions in CREATE TYPE
 but I can't define type_transform for that type. how did I can define
 type_transform for that type?

There's no such thing as a type transform.  There are transforms
associated with functions ... unfortunately, there's not currently
any provision for defining those at the SQL level.  You could poke
an entry into pg_proc.protransform if you're desperate enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-02-25 Thread Robert Haas
On Mon, Feb 24, 2014 at 10:13 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 To be honest, I don't like the idea of setting up this error context
 only for log_lock_wait messages. This sounds unnecessary complex to me
 and I think that in the few cases where this message doesn't add a
 value (and thus is useless) don't justify such complexity.

Reading this over, I'm not sure I understand why this is a CONTEXT at
all and not just a DETAIL for the particular error message that it's
supposed to be decorating.  Generally CONTEXT should be used for
information that will be relevant to all errors in a given code path,
and DETAIL for extra information specific to a particular error.

If we're going to stick with CONTEXT, we could rephrase it like this:

CONTEXT: while attempting to lock tuple (1,2) in relation with OID 3456

or when the relation name is known:

CONTEXT: while attempting to lock tuple (1,2) in relation public.foo

 Displaying whole tuple doesn't seem to be the most right way
 to get debug information and especially in this case we are
 already displaying tuple offset(ctid) which is unique identity
 to identify a tuple. It seems to me that it is sufficient to display
 unique value of tuple if present.
 I understand that there is no clear issue here, so may be if others also
 share their opinion then it will be quite easy to take a call.

I wouldn't be inclined to dump the whole tuple under any
circumstances.  That could be a lot more data than what you want
dumped in your log.  The PK could already be somewhat unreasonably
large, but the whole tuple could be a lot more unreasonably large.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder

2014-02-25 Thread Peter Eisentraut
You should be able to do this without specifically referencing the names
libpq or ecpg.  Look into the Makefile, if it defines
SO_MAJOR_VERSION, then it's a library you are concerned with, and then
do the copying or moving.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Alvaro Herrera
Josh Berkus escribió:

 (to clarify below: json refers to the current varlena datatype; JSON
 refers to JSON serialized data).

FWIW the term varlena json is misleading.  jsonb is also varlena, only
different.  I think you need a different term to say that json uses the
text representation.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unfortunate choice of short switch name in pgbench

2014-02-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Feb 25, 2014 at 2:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 To fix this, I propose removing the -P short form and only allowing the
 long --progress form.  I won't argue that this feature is completely
 useless, but for sure it's not something I'd want more often than once
 in a blue moon.  So I think it does not need to have a short form; and
 for sure it doesn't need a short form that's so easily confused with a
 commonly used switch.

 Hmm.  I don't have a real specific opinion on the value of this
 particular --progress option, but my experience is that most
 --progress options get a lot of use.

Meh.  A progress-reporting feature has use when the tool is working
towards completion of a clearly defined task.  In the case of pgbench,
if you told it to run for -T 60 seconds rather than -T 10 seconds,
that's probably because you don't trust a 10-second average to be
sufficiently reproducible.  So I'm not real sure that reporting averages
over shorter intervals is all that useful; especially not if it takes
cycles out of pgbench, which itself is often a bottleneck.

I could see some value in a feature that computed shorter-interval TPS
averages and then did some further arithmetic, like measuring the standard
deviation of the shorter-interval averages to assess how much noise there
will be in the full-run average.  But that's not what this does, and if it
did do that, reporting progress would not be what I'd see as its main
purpose.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 May I mark this patch as Ready for Committer by myself since
 this was already marked so in last CF3 and have had no objection
 or new suggestion in this CF4 for more than a month?

Sounds fair.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Josh Berkus
On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

Awaiting doc patch from Merlin, then.  It will need to be clear enough
that an ordinary user can distinguish which type they want.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-25 Thread Peter Eisentraut
I'm massively in favor of this feature.  (I had started writing it about
three times myself.)

The problem I see, however, is that most extensions, by recommendation,
set module_pathname = '$libdir/pgfoo', and so relocating the control
files will still end up pointing to a not relocated library file.

We would need to remove that and then ask users to keep their
dynamic_library_path in sync with extension_control_path.  That's error
prone, of course.

In order to address this properly, we need a new directory structure
that keeps library files and control files together, similar to how
Python, Ruby, etc. install things, and then just one path for everything.

Now a few technical problems.

When an extension is not found, I get the error message

ERROR:  could not open extension control file (null): Bad address

Something is broken there.

In the documentation, order extension_control_path after
dynamic_library_path.

Also, the documentation states that this controls the location of the
control file, but it of course controls the location of the script files
also.  That should be made clearer.  (It becomes clearer if we just have
one path for everything. ;-) )



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Merlin Moncure
On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.

Sure.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Josh Berkus
Bruce, Tom:

Can we change this text in the template release notes?


A dump/restore using
pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html,
or use of
pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is
required for those wishing to migrate data from any previous release.


Again, here we're recommending pg_dumpall with its many limitations, and
not mentioning pg_dump/pg_restore at all.  This has caused several
people to ask me if pg_dump is not supported for upgrading anymore.  Fix?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-25 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Or we could wait to see if anybody reports this sort of behavior
 in a shell that won't be out of support before 9.4 gets out the
 door.

 We have a field report of this happening in the sh shell in Solaris
 10.  Our staff has confirmed this.  In Solaris 10 they can start
 multiple clusters from a single shell, and if they later use Ctrl+C
 in that shell all of those instances are shut down.

Do you want to try the set -m hack suggested upthread?  I see no
point in pursuing the portability questions unless we've verified
that it fixes the problem for someone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Can we change this text in the template release notes?

 A dump/restore using
 pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html,
 or use of
 pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is
 required for those wishing to migrate data from any previous release.

 Again, here we're recommending pg_dumpall with its many limitations, and
 not mentioning pg_dump/pg_restore at all.  This has caused several
 people to ask me if pg_dump is not supported for upgrading anymore.  Fix?

There's a very good reason for not recommending pg_dump in this context:
it won't dump everything.  Yeah, if you know what you're doing, you might
use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc,
but we're not going to try to wedge all that info into one or two
sentences of release-note boilerplate.  If you can get that right then
you don't need the release notes to remind you anyway.  If you aren't
likely to get it right then the release notes would do you no service
by suggesting it.

I'm not sure what many limitations you think pg_dumpall has that pg_dump
doesn't.

I do think that it might be time to reword this to recommend pg_upgrade
first, though.  ISTM that the current wording dates from when pg_upgrade
could charitably be described as experimental.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Josh Berkus
On 02/25/2014 03:41 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Can we change this text in the template release notes?
 
 A dump/restore using
 pg_dumpallhttp://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html,
 or use of
 pg_upgradehttp://www.postgresql.org/docs/9.3/static/pgupgrade.html, is
 required for those wishing to migrate data from any previous release.
 
 Again, here we're recommending pg_dumpall with its many limitations, and
 not mentioning pg_dump/pg_restore at all.  This has caused several
 people to ask me if pg_dump is not supported for upgrading anymore.  Fix?
 
 There's a very good reason for not recommending pg_dump in this context:
 it won't dump everything.  Yeah, if you know what you're doing, you might
 use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc,
 but we're not going to try to wedge all that info into one or two
 sentences of release-note boilerplate.  If you can get that right then
 you don't need the release notes to remind you anyway.  If you aren't
 likely to get it right then the release notes would do you no service
 by suggesting it.

Right. But the fact that we don't mention it *at all* has caused some
users to ask me if regular pg_dump doesn't work for upgrades anymore.
Which reminds me, I need to get the doc patch for the upgrade page in
this week.

It does make me wonder if we should direct users to the upgrade page
though, instead of the individual command pages.

 I'm not sure what many limitations you think pg_dumpall has that pg_dump
 doesn't.

Lack of parallel dump and restore is the biggest one.

 I do think that it might be time to reword this to recommend pg_upgrade
 first, though.  ISTM that the current wording dates from when pg_upgrade
 could charitably be described as experimental.

Yah.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 02/25/2014 03:41 PM, Tom Lane wrote:
 There's a very good reason for not recommending pg_dump in this context:
 it won't dump everything.  Yeah, if you know what you're doing, you might
 use per-database pg_dump runs plus pg_dumpall -g to catch the roles etc,
 but we're not going to try to wedge all that info into one or two
 sentences of release-note boilerplate.  If you can get that right then
 you don't need the release notes to remind you anyway.  If you aren't
 likely to get it right then the release notes would do you no service
 by suggesting it.

 Right. But the fact that we don't mention it *at all* has caused some
 users to ask me if regular pg_dump doesn't work for upgrades anymore.

TBH, anybody who's asking that kind of question probably falls in the
category of wouldn't get it right.  I've heard enough bitching from
novices who thought that pg_dump would be enough to get everything out
of their now-gone database that I have no desire to encourage use of
bare pg_dump here.

(Whether we shouldn't redesign the functionality of these programs
is a different discussion.  The release notes have to reflect what is,
though, not what might ideally be.)

 It does make me wonder if we should direct users to the upgrade page
 though, instead of the individual command pages.

If we had a page discussing the pros and cons of different upgrade
methods, yeah, I'd be in favor of reducing the release-note text to a
pointer to that page.  I don't see such a page in a quick skim of the
fine manual's table of contents though?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Josh Berkus
On 02/25/2014 03:59 PM, Tom Lane wrote:
 If we had a page discussing the pros and cons of different upgrade
 methods, yeah, I'd be in favor of reducing the release-note text to a
 pointer to that page.  I don't see such a page in a quick skim of the
 fine manual's table of contents though?

I owe an update of
http://www.postgresql.org/docs/9.3/static/upgrading.html; I think I can
easily include a discussion of the various options there.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bit data type header reduction in some cases

2014-02-25 Thread Bruce Momjian
On Tue, Feb 25, 2014 at 09:32:55AM -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 02/25/2014 08:23 AM, Haribabu Kommi wrote:
  It's regarding a Todo item of Bit data type header reduction in some
  cases. The header contains two parts. 1)  The varlena header is
  automatically converted to 1 byte header from 4 bytes in case of small
  data. 2) The bit length header called bit_len to store the actual bit
  length which is of 4 bytes in size. I want to reduce this bit_len size to 1
  byte in some cases as similar to varlena header. With this change the size
  of the column reduced by 3 bytes, thus shows very good decrease in disk
  usage.
 
  [ various contorted schemes for doing that backward-compatibly ]
 
 TBH, this sounds like a huge amount of effort and a significant risk of
 new bugs in return for not darn much.  Who uses bit values anyway?
 They were removed from the SQL spec more than 10 years ago.  And of that
 population, who cares about a byte or two per value?  The field demand
 for this is not only zero, it's probably negative.  (The thread referenced
 by the TODO entry shows no evidence of user demand.)

OK, TODO item removed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] In which good intentions are punished, take 2

2014-02-25 Thread Tom Lane
Since I committed 49c817eab78c6f0ce8c3bf46766b73d6cf3190b7 to make
pg_do_encoding_conversion not fail silently, buildfarm member magpie
has been whining:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpiedt=2014-02-25%2011%3A00%3A08

To wit, that if you run contrib/citext's regression tests in a
LATIN2-encoded database, you get this:

  SELECT convert_to( name, 'ISO-8859-1' ) = convert_to( name::text, 
'ISO-8859-1' ) AS t FROM srt;
! ERROR:  default conversion function for encoding LATIN2 to LATIN1 does 
not exist

Well, it's right, there is no such conversion function.  A quick look
in the buildfarm logs for older runs confirms that previously, all
we were getting was the useless LOG bleats that the previous coding
emitted:

LOG:  default conversion function for encoding LATIN2 to LATIN1 does not 
exist
STATEMENT:  SELECT convert_to( name, 'ISO-8859-1' ) = convert_to( name::text, 
'ISO-8859-1' ) AS t FROM srt;

That didn't appear in the user-visible output, allowing the regression
test to appear to pass although under no circumstances could it be
argued that what was happening was sane.

We could possibly replace this test case with

SELECT convert_from( name::bytea, 'SQL_ASCII' ) = convert_from( 
name::text::bytea, 'SQL_ASCII' ) AS t FROM srt;

though I'm not sure I see the point: whatever you might think this
statement is testing, it's not got much to do with citext.  On the
other hand, we evidently have got precious little other buildfarm
coverage of the convert() family of functions, so maybe removing
this test altogether wouldn't be the best thing either.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Bruce Momjian
On Tue, Feb 25, 2014 at 06:41:26PM -0500, Tom Lane wrote:
 I'm not sure what many limitations you think pg_dumpall has that pg_dump
 doesn't.
 
 I do think that it might be time to reword this to recommend pg_upgrade
 first, though.  ISTM that the current wording dates from when pg_upgrade
 could charitably be described as experimental.

Wow, so pg_upgrade takes the lead!  And from Tom too!  :-)

I agree with Tom that mentioning pg_dump/restore is going to lead to
global object data loss, and throwing the users to a URL with no
explaination isn't going to help either.  What we could do is to
restructure the existing text and add a link to the upgrade URL for more
details.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-25 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
 I'm massively in favor of this feature.  (I had started writing it about
 three times myself.)

Agreed.

 The problem I see, however, is that most extensions, by recommendation,
 set module_pathname = '$libdir/pgfoo', and so relocating the control
 files will still end up pointing to a not relocated library file.

I was wondering how that was dealt with- I simply have not had time to
get to looking at this in more detail.  I thought the answer I got from
Dimitri was that $libdir would actually end up being resolved to any of
the available directories due to his other patch...?  Perhaps we just
need to add in to that list the alternate directory for the control
files?  Or, what I had been thinking at one point, was making $libdir
actually be where this control file lives instead.  There's risk there
though, I suppose, as today only one thing means $libdir.

Another thought that I had was dealing with possible overlaps and
clarifying things, perhaps such as having a mapping of 'name' to
'directory' which remaps $libdir for that 'name' and then extensions
would be created using the 'name' space.

eg:

set module_paths = 'mymodulepath:/path/to/whatever;nextmodulepath:/other';
CREATE EXTENSION mymodulepath:my_extension;

 We would need to remove that and then ask users to keep their
 dynamic_library_path in sync with extension_control_path.  That's error
 prone, of course.

I'm starting to regret that dynamic_library_path exists, tbh.  Still, I
don't think it'd be too bad to automatically add to that path (or to
what's searched) the module_paths.

 In order to address this properly, we need a new directory structure
 that keeps library files and control files together, similar to how
 Python, Ruby, etc. install things, and then just one path for everything.

Right, that's more-or-less what I was thinking module_path would be.

 Now a few technical problems.

Agree with all these.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-02-25 Thread Josh Berkus
On 02/25/2014 04:42 PM, Bruce Momjian wrote:
 On Tue, Feb 25, 2014 at 06:41:26PM -0500, Tom Lane wrote:
 I'm not sure what many limitations you think pg_dumpall has that pg_dump
 doesn't.

 I do think that it might be time to reword this to recommend pg_upgrade
 first, though.  ISTM that the current wording dates from when pg_upgrade
 could charitably be described as experimental.
 
 Wow, so pg_upgrade takes the lead!  And from Tom too!  :-)
 
 I agree with Tom that mentioning pg_dump/restore is going to lead to
 global object data loss, and throwing the users to a URL with no
 explaination isn't going to help either.  What we could do is to
 restructure the existing text and add a link to the upgrade URL for more
 details.

What I was suggesting was something like:

Users upgrading from earlier versions will need to go through the
entire upgrade procedure, as described on our upgrade page: link

The problem is that anything we say about how to upgrade in one short
sentence is going to confuse some people.  BTW, the reason I got that
question about pg_dump was that 9.2's release notes say pg_dump and
9.3's say pg_dumpall, causing users to think there's been some kind of
change.

Of course, this means I need to fix the upgrade page, and I need to
write backported versions of that fix for at least 9.1 and 9.2.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] In which good intentions are punished, take 2

2014-02-25 Thread Alvaro Herrera
Tom Lane wrote:

 On the
 other hand, we evidently have got precious little other buildfarm
 coverage of the convert() family of functions, so maybe removing
 this test altogether wouldn't be the best thing either.

We do have precious little testing on encodings and conversions, yes.
The problem is how to test these things without having the tests fail
when any particular encoding is not installed in the test system.

Maybe we can use the Perl test rig for this too: Peter said that if a
test requires something not installed, the test is skipped without
causing a failure.  It seems to me that we could take advantage that so
that each member tests whatever involves only the encodings it has
installed; while each individual member would skip a large percentage of
tests, the buildfarm as a whole would be testing a sizable portion, if
not all of it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Shigeru Hanada
Hi Kaigai-san,

2014-02-25 13:28 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 The reason why I asked the question above is, I haven't been 100% certain
 about its usage. Indeed, semifactors is applied on a limited usage, but
 quite easy to reproduce by extension later (using clauselist_selectivity)
 if extension wants this factor. So, I agree with removing the semifactors
 here.

Agreed.  It would be nice to mention how to obtain semifactos for
people who want to implement advanced join overriding.

 mergeclause_list and param_source_rels seem little easier to use, but
 anyway it should be documented how to use those parameters.

 The mergeclause_list might not be sufficient for extensions to determine
 whether its own mergejoin is applicable here. See the comment below; that
 is on the head of select_mergejoin_clauses.

 |  * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
 |  * this is a right/full join and there are nonmergejoinable join clauses.
 |  * The executor's mergejoin machinery cannot handle such cases, so we have
 |  * to avoid generating a mergejoin plan.  (Note that this flag does NOT
 |  * consider whether there are actually any mergejoinable clauses.  This is
 |  * correct because in some cases we need to build a clauseless mergejoin.
 |  * Simply returning NIL is therefore not enough to distinguish safe from
 |  * unsafe cases.)
 |
 It says, mergejoin_clause == NIL is not a sufficient check to determine
 whether the mergejoin logic is applicable on the target join.
 So, either of them is probably an option for extension that tries to implement

Perhaps you mean both of them?

 their own mergejoin logic; (1) putting both of mergejoin_allowed and
 mergeclause_list as arguments of the hook, or (2) re-definition of
 select_mergejoin_clauses() as extern function to reproduce the variables on
 demand. Which one is more preferable?

I prefer (1), because exposing inside of planner might blocks changing
those internal functions.  If (at the moment) those information is
enough for overriding merge join for CSP, let's provide as parameters.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] In which good intentions are punished, take 2

2014-02-25 Thread Andrew Dunstan


On 02/25/2014 08:53 PM, Alvaro Herrera wrote:

Tom Lane wrote:


On the
other hand, we evidently have got precious little other buildfarm
coverage of the convert() family of functions, so maybe removing
this test altogether wouldn't be the best thing either.

We do have precious little testing on encodings and conversions, yes.
The problem is how to test these things without having the tests fail
when any particular encoding is not installed in the test system.

Maybe we can use the Perl test rig for this too: Peter said that if a
test requires something not installed, the test is skipped without
causing a failure.  It seems to me that we could take advantage that so
that each member tests whatever involves only the encodings it has
installed; while each individual member would skip a large percentage of
tests, the buildfarm as a whole would be testing a sizable portion, if
not all of it.




It should be easy (at least on *nix) for the buildfarm client to check 
what encodings are installed on the machine and run tests accordingly. I 
haven't been following closely, but if someone provides me with a simple 
spec I'll try to code it up. We're about due for a release anyway.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-25 Thread Robert Haas
On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris j...@wizmail.org wrote:
 On 24/02/14 17:38, Robert Haas wrote:

 On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote:

 Run under cachegrind, it takes about N/10 last-level cache misses,
 all for the new item being introduced to the heap.  The existing
 code takes none at all.


 Can you explain this further?  This seems like an important clue that
 could be useful when trying to optimize this code, but I'm a little
 unclear which part of the operation has more cache misses with your
 changes and why.


 In the patched version, for the heapify operation the outer loop
 starts at the last heap-parent tuple and works backward to the
 start of the tuples array.  A copy is taken of the SortTuple being
 operated on for the inner loop to use.  This copy suffers cache misses.

 (The inner loop operates on elements between the copy source and the
 end of the array).


 In the original, the outer loop runs the array in increasing index
 order.  Again a copy is taken of the SortTuple for the inner loop
 to use.  This copy does not appear to take significant cache misses.

 (The inner loop operates on elements between the copy source and
 the start of the array).

Do you have any theory as to why this happens in one case but not the other?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Shigeru Hanada
Hi Kaigai-san,

2014-02-23 22:24 GMT+09:00 Kohei KaiGai kai...@kaigai.gr.jp:
 (1) Interface to add alternative paths in addition to built-in join paths

I found that create_custom_path is not used at all in your patch.
I revised postgresql_fdw.c to use it like this.

...
/* Create join information which is stored as private information. */
memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
jinfo.fdw_server_oid = o_server_oid;
jinfo.fdw_user_oid = o_user_oid;
jinfo.relids = joinrel-relids;
jinfo.jointype = jointype;
jinfo.outer_rel = o_relinfo;
jinfo.inner_rel = i_relinfo;
jinfo.remote_conds = j_remote_conds;
jinfo.local_conds = j_local_conds;

/* OK, make a CustomScan node to run remote join */
cpath = create_customscan_path(root,
   joinrel,
   0, 0, 0, /* estimate later */
   NIL,
   required_outer,
   postgres-fdw,
   0,
   packPgRemoteJoinInfo(jinfo));

estimate_remote_join_cost(root, cpath, jinfo, sjinfo);

add_path(joinrel, cpath-path);
...

This seems to work fine.  Is this right approach?  If so,this portion
would be a good example to replace local join with custom scan for
authors of custom scan providers.  One thing I worry is the case that
you've intentionally avoided calling create_customscan_path.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-02-25 Thread Kyotaro HORIGUCHI
Hello, I tried minimal implementation to do that.

At Tue, 25 Feb 2014 19:45:56 +0900, Etsuro Fujita wrote
 In addition to an issue pointed out recently by Horiguchi-san, I've
 found there is another issue we have to discuss.  That is, we can't
 build any parameterized Append paths for an inheritance tree that
 contains at least one foreign table, in set_append_rel_pathlist().  This
 is because the patch doesn't take into consideration
 reparameterization for paths for a foreign table, which attempts to
 modify these paths to have greater parameterization so that each child
 path in the inheritance tree can have the exact same parameterization.
 To do so, I think we would need to add code for the foreign-table case
 to reparameterize_path().  And I think we should introduce a new FDW
 routine, say ReparameterizeForeignPath() because the processing would be
 performed best by the FDW itself.
 
 Comments are welcome!

I think the problem is foreign childs in inheritance tables
prevents all menber in the inheritance relation from using
parameterized paths, correct?

|=# explain select * from p join (select uname from c1 limit 1) s on s.uname = 
p.uname;
|  QUERY PLAN   
|---
| Hash Join  (cost=0.04..244.10 rows=50 width=58)
|   Hash Cond: (p.uname = c1_1.uname)
|   -  Append  (cost=0.00..206.01 rows=10012 width=50)
| -  Seq Scan on p  (cost=0.00..0.00 rows=1 width=168)
| -  Seq Scan on c1  (cost=0.00..204.01 rows=10001 width=50)
| -  Foreign Scan on c2  (cost=0.00..2.00 rows=10 width=168)
|   Foreign File: /etc/passwd
|   Foreign File Size: 1851
|   -  Hash  (cost=0.03..0.03 rows=1 width=8)
| -  Limit  (cost=0.00..0.02 rows=1 width=8)
|   -  Seq Scan on c1 c1_1  (cost=0.00..204.01 rows=10001 width=8)
| Planning time: 1.095 ms

Hmm. I tried minimal implementation to do that. This omits cost
recalculation but seems to work as expected. This seems enough if
cost recalc is trivial here.

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index b79af7a..18ced04 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2062,6 +2062,16 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, 
path-pathkeys,

required_outer);
+   case T_ForeignScan:
+   {
+   ForeignPath *fpath = (ForeignPath*) path;
+   ForeignPath *newpath = makeNode(ForeignPath);
+   memcpy(newpath, fpath, sizeof(ForeignPath));
+   newpath-path.param_info =
+   get_baserel_parampathinfo(root, rel, 
required_outer);
+   /* cost recalc omitted */
+   return (Path *)newpath;
+   }
default:
break;
}


|=# explain select * from p join (select uname from c1 limit 1) s on s.uname = 
p.uname;
| QUERY PLAN 
|
| Nested Loop  (cost=0.00..10.46 rows=50 width=58)
|   -  Limit  (cost=0.00..0.02 rows=1 width=8)
| -  Seq Scan on c1 c1_1  (cost=0.00..204.01 rows=10001 width=8)
|   -  Append  (cost=0.00..10.30 rows=12 width=158)
| -  Seq Scan on p  (cost=0.00..0.00 rows=1 width=168)
|   Filter: (c1_1.uname = uname)
| -  Index Scan using i_c1 on c1  (cost=0.29..8.30 rows=1 width=50)
|   Index Cond: (uname = c1_1.uname)
| -  Foreign Scan on c2  (cost=0.00..2.00 rows=10 width=168)
|   Filter: (c1_1.uname = uname)
|   Foreign File: /etc/passwd
|   Foreign File Size: 1851
| Planning time: 2.044 ms


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-25 Thread Haribabu Kommi
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi
kommi.harib...@gmail.comwrote:

 Thanks for the information, I will apply other patches also and start
 testing.


When try to run the pgbench test, by default the cache-scan plan is not
chosen because of more cost. So I increased the cpu_index_tuple_cost to a
maximum value or by turning off index_scan, so that the plan can chose the
cache_scan as the least cost.

The configuration parameters changed during the test are,

shared_buffers - 2GB, cache_scan.num_blocks - 1024
wal_buffers - 16MB, checkpoint_segments - 255
checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or
enable_indexscan=off

Test procedure:
1. Initialize the database with pgbench with 75 scale factor.
2. Create the triggers on pgbench_accounts
3. Use a select query to load all the data into cache.
4. Run  a simple update pgbench test.

Plan details of pgbench simple update queries:

postgres=# explain update pgbench_accounts set abalance = abalance where
aid = 10;
QUERY PLAN
---
 Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
 Index Cond: (aid = 10)
 Planning time: 0.045 ms
(4 rows)

postgres=# explain select abalance from pgbench_accounts where aid = 10;
 QUERY PLAN

 Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99 rows=1
width=4)
   Filter: (aid = 10)
 Planning time: 0.042 ms
(3 rows)

I am observing a too much delay in performance results. The performance
test script is attached in the mail.
please let me know if you find any problem in the test.

Regards,
Hari Babu
Fujitsu Australia


run_reading.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-02-25 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
 On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  May I mark this patch as Ready for Committer by myself since
  this was already marked so in last CF3 and have had no objection
  or new suggestion in this CF4 for more than a month?
 
 Sounds fair.

Thank you, I will send this patch to commtters after waiting
another few days for more suggestions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Craig Ringer
On 02/26/2014 06:21 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.
 
 Sure.

Please also highlight that any change will require a full table rewrite
with an exclusive lock, so data type choices on larger tables may be
hard to change later.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Stephen Frost
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  Instead of custom node, it might be better idea to improve FDW 
  infrastructure
  to push join. For the starters, is it possible for the custom scan node
  hooks to create a ForeignScan node? In general, I think, it might be better
  for the custom scan hooks to create existing nodes if they serve the 
  purpose.
 
 It does not work well because existing FDW infrastructure is designed to
 perform on foreign tables, not regular tables. Probably, it needs to revise
 much our assumption around the background code, if we re-define the purpose
 of FDW infrastructure. For example, ForeignScan is expected to return a tuple
 according to the TupleDesc that is exactly same with table definition.
 It does not fit the requirement if we replace a join-node by ForeignScan
 because its TupleDesc of joined relations is not predefined.

I'm not following this logic at all- how are you defining foreign from
regular?  Certainly, in-memory-only tables which are sitting out in
some non-persistent GPU memory aren't regular by any PG definition.
Perhaps you can't make ForeignScan suddenly work as a join-node
replacement, but I've not seen where anyone has proposed that (directly-
I've implied it on occation where a remote view can be used, but that's
not the same thing as having proper push-down support for joins).

 I'd like to define these features are designed for individual purpose.

My previous complaint about this patch set has been precisely that each
piece seems to be custom-built and every patch needs more and more
backend changes.  If every time someone wants to do something with this
CustomScan API, they need changes made to the backend code, then it's
not a generally useful external API.  We really don't want to define
such an external API as then we have to deal with backwards
compatibility, particularly when it's all specialized to specific use
cases which are all different.

 FDW is designed to intermediate an external data source and internal heap
 representation according to foreign table definition. In other words, its
 role is to generate contents of predefined database object on the fly.

There's certainly nothing in the FDW API which requires that the remote
side have an internal heap representation, as evidenced by the various
FDWs which already exist and certainly are not any kind of 'normal'
heap.  Every query against the foriegn relation goes through the FDW API
and can end up returning whatever the FDW author decides is appropriate
to return at that time, as long as it matches the tuple description-
which is absolutely necessary for any kind of sanity, imv.

 On the other hands, custom-scan is designed to implement alternative ways
 to scan / join relations in addition to the methods supported by built-in
 feature.

I can see the usefulness in being able to push down aggregates or other
function-type calls to the remote side of an FDW and would love to see
work done along those lines, along with the ability to push down joins
to remote systems- but I'm not convinced that the claimed flexibility
with the CustomScan API is there, given the need to continue modifying
the backend code for each use-case, nor that there are particularly new
and inventive ways of saying find me all the cases where set X overlaps
with set Y.  I'm certainly open to the idea that we could have an FDW
API which allows us to ask exactly that question and let the remote side
cost it out and give us an answer for a pair of relations but that isn't
what this is.  Note also that in any kind of aggregation push-down we
must be sure that the function is well-defined and that the FDW is on
the hook to ensure that the returned data is the same as if we ran the
same aggregate function locally, otherwise the results of a query might
differ based on if the aggregate was fired locally or remotely (which
could be influenced by costing- eg: the size of the relation or its
statistics).

 I'm motivated to implement GPU acceleration feature that works transparently
 for application. Thus, it has to be capable on regular tables, because most
 of application stores data on regular tables, not foreign ones.

You want to persist that data in the GPU across multiple calls though,
which makes it unlike any kind of regular PG table and much more like
some foreign table.  Perhaps the data is initially loaded from a local
table and then updated on the GPU card in some way when the 'real' table
is updated, but neither of those makes it a regular PG table.

  Since a custom node is open implementation, it will be important to pass
  as much information down to the hooks as possible; lest the hooks will be
  constrained.  Since the functions signatures within the planner, optimizer
  will change from time to time, so the custom node hook signatures will need
  to change from time to time. That might turn out to be maintenance overhead.

It's more than from time-to-time, it was for each use case in the
given 

Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Peter Geoghegan
On Tue, Feb 25, 2014 at 8:07 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 Please also highlight that any change will require a full table rewrite
 with an exclusive lock, so data type choices on larger tables may be
 hard to change later.

It sure looks like they're binary-coercible to me:

+ CREATE CAST (hstore AS jsonb)
+   WITHOUT FUNCTION AS IMPLICIT;
+
+ CREATE CAST (jsonb AS hstore)
+   WITHOUT FUNCTION AS IMPLICIT;

Is this okay?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Stephen Frost
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
 Yes, the part-1 patch provides a set of interface portion to interact
 between the backend code and extension code. Rest of part-2 and part-3
 portions are contrib modules that implements its feature on top of
 custom-scan API.

Just to come back to this- the other two contrib module patches, at
least as I read over their initial submission, were *also* patching
portions of backend code which it was apparently discovered that they
needed.  That's a good bit of my complaint regarding this approach.

 FDW's join pushing down is one of the valuable use-cases of this interface,
 but not all. As you might know, my motivation is to implement GPU acceleration
 feature on top of this interface, that offers alternative way to scan or join
 relations or potentially sort or aggregate.

If you're looking to just use GPU acceleration for improving individual
queries, I would think that Robert's work around backend workers would
be a more appropriate way to go, with the ability to move a working set
of data from shared buffers and on-disk representation of a relation
over to the GPU's memory, perform the operation, and then copy the
results back.  If that's not possible or effective wrt performance, then
I think we need to look at managing the external GPU memory as a foreign
system through an FDW which happens to be updated through triggers or
similar.  The same could potentially be done for memcached systems, etc.

regular PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go hunt
down the rows.  How is that going to work here, if this is both a
regular table and stored off in a GPU's memory across subsequent
queries or even transactions?

 Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
 side, it might be an idea to have common code (like a logic to check whether
 the both relations to be joined belongs to same foreign server) on the backend
 side as something like a gateway of them.

Yes, that's what I was suggesting above- we should be asking the FDWs on
a case-by-case basis how to cost out the join between foreign tables
which they are responsible for.  Asking two different FDWs servers to
cost out a join between their tables doesn't make any sense to me.

 As an aside, what should be the scope of FDW interface?
 In my understanding, it allows extension to implement something on behalf of
 a particular data structure being declared with CREATE FOREIGN TABLE.

That's where it is today, but certainly not our end goal.

 In other words, extension's responsibility is to generate a view of 
 something
 according to PostgreSQL' internal data structure, instead of the object 
 itself.

The result of the FDW call needs to be something which PG understands
and can work with, otherwise we wouldn't be able to, say, run PL/pgsql
code on the result, or pass it into some other aggregate which we
decided was cheaper to run locally.  Being able to push down aggregates
to the remote side of an FDW certainly fits in quite well with that.

 On the other hands, custom-scan interface allows extensions to implement
 alternative methods to scan or join particular relations, but it is not a role
 to perform as a target being referenced in queries. In other words, it is 
 methods
 to access objects.

The custom-scan interface still needs to produce something according
to PG's internal data structures, so it's not clear to me where you're
going with this.

 It is natural both features are similar because both of them intends 
 extensions
 to hook the planner and executor, however, its purpose is different.

I disagree as I don't really view FDWs as hooks.  A hook is more
like a trigger- sure, you can modify the data in transit, or throw an
error if you see an issue, but you don't get to redefine the world and
throw out what the planner or optimizer knows about the rest of what is
going on in the query.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 On Tue, Feb 25, 2014 at 8:07 PM, Craig Ringer cr...@2ndquadrant.com wrote:
  Please also highlight that any change will require a full table rewrite
  with an exclusive lock, so data type choices on larger tables may be
  hard to change later.
 
 It sure looks like they're binary-coercible to me:
 
 + CREATE CAST (hstore AS jsonb)
 +   WITHOUT FUNCTION AS IMPLICIT;
 +
 + CREATE CAST (jsonb AS hstore)
 +   WITHOUT FUNCTION AS IMPLICIT;
 
 Is this okay?

Err, I'm not following this thread all *that* closely, but I was pretty
sure the issue was json vs. jsonb, and I'd be mighty confused as to wtf
was going on if those were binary-coercible...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-25 Thread Kouhei Kaigai
 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
   Instead of custom node, it might be better idea to improve FDW
   infrastructure to push join. For the starters, is it possible for
   the custom scan node hooks to create a ForeignScan node? In general,
   I think, it might be better for the custom scan hooks to create existing
 nodes if they serve the purpose.
  
  It does not work well because existing FDW infrastructure is designed
  to perform on foreign tables, not regular tables. Probably, it needs
  to revise much our assumption around the background code, if we
  re-define the purpose of FDW infrastructure. For example, ForeignScan
  is expected to return a tuple according to the TupleDesc that is exactly
 same with table definition.
  It does not fit the requirement if we replace a join-node by
  ForeignScan because its TupleDesc of joined relations is not predefined.
 
 I'm not following this logic at all- how are you defining foreign from
 regular?  Certainly, in-memory-only tables which are sitting out in some
 non-persistent GPU memory aren't regular by any PG definition.
 Perhaps you can't make ForeignScan suddenly work as a join-node replacement,
 but I've not seen where anyone has proposed that (directly- I've implied
 it on occation where a remote view can be used, but that's not the same
 thing as having proper push-down support for joins).
 
This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the table
pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected. 

  I'd like to define these features are designed for individual purpose.
 
 My previous complaint about this patch set has been precisely that each
 piece seems to be custom-built and every patch needs more and more backend
 changes.  If every time someone wants to do something with this CustomScan
 API, they need changes made to the backend code, then it's not a generally
 useful external API.  We really don't want to define such an external API
 as then we have to deal with backwards compatibility, particularly when
 it's all specialized to specific use cases which are all different.
 
The changes to backend are just for convenient. We may be able to implement
functions to translate Bitmapset from/to cstring form in postgres_fdw,
does it make sense to maintain individually?
I thought these functions were useful to have in the backend commonly, but
is not a fundamental functionality lacks of the custom-scan interface.

  FDW is designed to intermediate an external data source and internal
  heap representation according to foreign table definition. In other
  words, its role is to generate contents of predefined database object
 on the fly.
 
 There's certainly nothing in the FDW API which requires that the remote
 side have an internal heap representation, as evidenced by the various FDWs
 which already exist and certainly are not any kind of 'normal'
 heap.  Every query against the foriegn relation goes through the FDW API
 and can end up returning whatever the FDW author decides is appropriate
 to return at that time, as long as it matches the tuple description- which
 is absolutely necessary for any kind of sanity, imv.
 
Yes. It's my understanding for the role of FDW driver.

  On the other hands, custom-scan is designed to implement alternative
  ways to scan / join relations in addition to the methods supported by
  built-in feature.
 
 I can see the usefulness in being able to push down aggregates or other
 function-type calls to the remote side of an FDW and would love to see work
 done along those lines, along with the ability to push down joins to remote
 systems- but I'm not convinced that the claimed flexibility with the
 CustomScan API is there, given the need to continue modifying the backend
 code for each use-case, nor that there are particularly new and inventive
 ways of saying find me all the cases where set X overlaps with set Y.
 I'm certainly open to the idea that we could have an FDW API which allows
 us to ask exactly that question and let the remote side cost it out and
 give us an answer for a pair of relations but that isn't what this is.  Note
 also that in any kind of aggregation push-down we must be sure that the
 function is well-defined and that the FDW is on the hook to ensure that
 the returned data is the same as if we ran the same aggregate function 
 locally,
 otherwise the results of a query might differ based on if the aggregate
 was fired locally or remotely (which could be influenced by costing- eg:
 the size of the relation or its statistics).
 
I can also understand the usefulness of join or aggregation into the remote
side in case of foreign table reference. In similar way, it is also useful
if we can push these CPU intensive operations into co-processors on regular
table references.
As I mentioned above, 

Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Hannu Krosing
On 02/25/2014 08:54 PM, Josh Berkus wrote:
 That's called a straw man argument, Robert.
 Me: We should recommend that people use jsonb unless they have a
 specific reason for using json.
We could also make the opposite argument - people use json unless they
have a specific reason for using jsonb.

btw, there is one more thing about JSON which I recently learned - a lot of
JavaScript people actually expect the JSON binary form to retain field order

It is not in any specs, but nevertheless all major imlementations do it and
some code depends on it.
IIRC, this behaviour is currently also met only by json and not by jsonb.

 Merlin: We should present them side-by-side with a complex comparison.
 Robert: Josh wants to junk all relational data and use only jsonb! I
 mean, really, WTF? 

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Christophe Pettus

On Feb 25, 2014, at 1:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.

I have no doubt that some code depends on it, but all major implementations 
is too strong a statement.  BSON, in particular, does not have stable field 
order.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >