Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Stephen Frost
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  Just to come back to this- the other two contrib module patches, at least
  as I read over their initial submission, were *also* patching portions of
  backend code which it was apparently discovered that they needed.  That's
  a good bit of my complaint regarding this approach.
  
 ?? Sorry, are you still negative on the portion of backend patched
 by the part-2 and part-3 portion??

Pretty sure that I sent that prior to your last email, or at least
before I was to the end of it.

  If you're looking to just use GPU acceleration for improving individual
  queries, I would think that Robert's work around backend workers would be
  a more appropriate way to go, with the ability to move a working set of
  data from shared buffers and on-disk representation of a relation over to
  the GPU's memory, perform the operation, and then copy the results back.
 
 The approach is similar to the Robert's work except for GPU adoption,
 instead of multicore CPUs. So, I tried to review his work to apply
 the facilities on my extension also.

Good, I'd be very curious to hear how that might solve the issue for
you, instead of using hte CustomScan approach..

  regular PG tables, just to point out one issue, can be locked on a
  row-by-row basis, and we know exactly where in shared buffers to go hunt
  down the rows.  How is that going to work here, if this is both a regular
  table and stored off in a GPU's memory across subsequent queries or even
  transactions?
  
 It shall be handled case-by-case basis, I think. If row-level lock is
 required over the table scan, custom-scan node shall return a tuple being
 located on the shared buffer, instead of the cached tuples. Of course,
 it is an option for custom-scan node to calculate qualifiers by GPU with
 cached data and returns tuples identified by ctid of the cached tuples.
 Anyway, it is not a significant problem.

I think you're being a bit too hand-wavey here, but if we're talking
about pre-scanning the data using PG before sending it to the GPU and
then only performing a single statement on the GPU, we should be able to
deal with it.  I'm worried about your ideas to try and cache things on
the GPU though, if you're not prepared to deal with locks happening in
shared memory on the rows you've got cached out on the GPU, or hint
bits, or the visibility map being updated, etc...

 OK, I'll move the portion that will be needed commonly for other FDWs into
 the backend code.

Alright- but realize that there may be objections there on the basis
that the code/structures which you're exposing aren't, and will not be,
stable.  I'll have to go back and look at them myself, certainly, and
their history.

 Yes. According to the previous discussion around postgres_fdw getting
 merged, all we can trust on the remote side are built-in data types,
 functions, operators or other stuffs only.

Well, we're going to need to expand that a bit for aggregates, I'm
afraid, but we should be able to define the API for those aggregates
very tightly based on what PG does today and require that any FDW
purporting to provides those aggregates do it the way PG does.  Note
that this doesn't solve all the problems- we've got other issues with
regard to pushing aggregates down into FDWs that need to be solved.

 The custom-scan node is intended to perform on regular relations, not
 only foreign tables. It means a special feature (like GPU acceleration)
 can perform transparently for most of existing applications. Usually,
 it defines regular tables for their work on installation, not foreign
 tables. It is the biggest concern for me.

The line between a foreign table and a local one is becoming blurred
already, but still, if this is the goal then I really think the
background worker is where you should be focused, not on this Custom
Scan API.  Consider that, once we've got proper background workers,
we're going to need new nodes which operate in parallel (or some other
rejiggering of the nodes- I don't pretend to know exactly what Robert is
thinking here, and I've apparently forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

 I might have miswording. Anyway, I want plan nodes that enable extensions
 to define its behavior, even though it's similar to ForeignScan, but allows
 to perform on regular relations. Also, not only custom-scan and foreign-scan,
 any plan nodes work according to the interface to co-work with other nodes,
 it is not strange that both of interfaces are similar.

It sounds a lot like you're trying to define, external to PG, what
Robert is already trying to get going *internal* to PG, and I really
don't want to end up in a situation where we've got a solution for the
uncommon 

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Shigeru Hanada
2014-02-26 16:46 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 Just to come back to this- the other two contrib module patches, at least
 as I read over their initial submission, were *also* patching portions of
 backend code which it was apparently discovered that they needed.  That's
 a good bit of my complaint regarding this approach.

 ?? Sorry, are you still negative on the portion of backend patched
 by the part-2 and part-3 portion??

Perhaps he meant to separate patches based on feature-based rule.  IMO
if exposing utilities is essential for Custom Scan API in practical
meaning, IOW to implement and maintain an extension which implements
Custom Scan API, they should be go into the first patch.  IIUC two
contrib modules are also PoC for the API, so part-2/3 patch should
contain only changes against contrib and its document.

Besides that, some typo fixing are mixed in part-2 patch.  They should
go into the part-1 patch where the typo introduced.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Stephen Frost
* Shigeru Hanada (shigeru.han...@gmail.com) wrote:
 Perhaps he meant to separate patches based on feature-based rule.  IMO
 if exposing utilities is essential for Custom Scan API in practical
 meaning, IOW to implement and maintain an extension which implements
 Custom Scan API, they should be go into the first patch.  IIUC two
 contrib modules are also PoC for the API, so part-2/3 patch should
 contain only changes against contrib and its document.

That's what I was getting at, yes.

 Besides that, some typo fixing are mixed in part-2 patch.  They should
 go into the part-1 patch where the typo introduced.

Agreed.

THanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extension_control_path

2014-02-26 Thread Dimitri Fontaine
Hi,

Peter Eisentraut pete...@gmx.net writes:
 I'm massively in favor of this feature.  (I had started writing it about
 three times myself.)

Thanks!

 The problem I see, however, is that most extensions, by recommendation,
 set module_pathname = '$libdir/pgfoo', and so relocating the control
 files will still end up pointing to a not relocated library file.

It's kind of true. Is the phrasing “typically” followed by an example
really a recommendation though? I though it was more a detailed
explanation of the way it works.

We still have several other ways to tell PostgreSQL which lib to use for
each and every LANGUAGE C function:

  - $libdir/soname
  - absolute/path
  - MODULE_PATHNAME
  - any/relative/path which is to be solved in dynamic_library_path

Also, editing the AS '$libdir/foo' occurences from an SQL script is a
quite very easy thing to do programmatically.

 We would need to remove that and then ask users to keep their
 dynamic_library_path in sync with extension_control_path.  That's error
 prone, of course.

I don't see any pressure in changing the way things currently work after
adding this new GUC in. As you say, when extension_control_path is used
then some extra work *might need* to be done in order to ensure that the
right library is getting loaded.

I mainly see that as a distribution/distributor problem tho.

 In order to address this properly, we need a new directory structure
 that keeps library files and control files together, similar to how
 Python, Ruby, etc. install things, and then just one path for everything.

It might be true, be it reads to me like you're omiting the directory
parameter from the control file: the scripts and auxilliary control
files might be found anywhere else on the file system already.

Again, my view is that if you want to do things in a non-standard way
then you need to tweak the control file and maybe the script files. It's
a distribution problem, and I'm solving it in an extra software layer.

PostgreSQL is very flexible about where to organise extension files
currently, *except* for the control file. This patch is providing the
same level of flexibility to this part. Of course flexibility can be
seen as creating a mess, but I don't think it's this patch nor
PostgreSQL core to solve that mess.

 Now a few technical problems.

Will see about fixing those later, by friday given my current schedule,
thanks.

 Also, the documentation states that this controls the location of the
 control file, but it of course controls the location of the script files
 also.  That should be made clearer.  (It becomes clearer if we just have
 one path for everything. ;-) )

Again, we have directory = 'whatever' in the control file to control
where to find the script files. I'm not sure your of course follows.
Will still edit docs.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Kouhei Kaigai
 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  This regular one means usual tables. Even though custom implementation
  may reference self-managed in-memory cache instead of raw heap, the
  table pointed in user's query shall be a usual table.
  In the past, Hanada-san had proposed an enhancement of FDW to support
  remote-join but eventually rejected.
 
 I'm not aware of the specifics around that proposal but I don't believe
 we, as a community, have decided to reject the idea in general.
 
IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.
I believe it is still one of our killer feature if we can revise the
implementation.

Hanada-san, could you put the reason why your proposition was rejected
before?

  I thought these functions were useful to have in the backend commonly,
  but is not a fundamental functionality lacks of the custom-scan interface.
 
 Then perhaps they should be exposed more directly?  I can understand
 generally useful functionality being exposed in a way that anyone can use
 it, but we need to avoid interfaces which can't be stable due to normal
 / ongoing changes to the backend code.
 
The functions my patches want to expose are:
 - get_restriction_qual_cost()
 - fix_expr_common()

And, the functions my patches newly want are:
 - bms_to_string()
 - bms_from_string()

Above two functions are defined as static functions because cost estimation
is done at costsize.c and set-reference is done at setrefs.c, however,
custom-scan breaks this assumption, so I moved it into public.
These are used by everyone, but everyone exists on a particular file.

  I can also understand the usefulness of join or aggregation into the
  remote side in case of foreign table reference. In similar way, it is
  also useful if we can push these CPU intensive operations into
  co-processors on regular table references.
 
 That's fine, if we can get data to and from those co-processors efficiently
 enough that it's worth doing so.  If moving the data to the GPU's memory
 will take longer than running the actual aggregation, then it doesn't make
 any sense for regular tables because then we'd have to cache the data in
 the GPU's memory in some way across multiple queries, which isn't something
 we're set up to do.
 
When I made a prototype implementation on top of FDW, using CUDA, it enabled
to run sequential scan 10 times faster than SeqScan on regular tables, if
qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
mode using hardware DMA. It allows to hide the cost of data transfer by
pipelining, if here is enough number of records to be transferred.
Also, the recent trend of semiconductor device is GPU integration with CPU,
that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
the data to be calculated. This trend is dominated by physical law because
of energy consumption by semiconductor. So, I'm optimistic for my idea.

  As I mentioned above, the backend changes by the part-2/-3 patches are
  just minor stuff, and I thought it should not be implemented by
  contrib module locally.
 
 Fine- then propose them as generally useful additions, not as patches which
 are supposed to just be for contrib modules using an already defined
 interface.  If you can make a case for that then perhaps this is more
 practical.
 
The usage was found by the contrib module that wants to call static
functions, or feature to translate existing data structure to/from
cstring. But, anyway, does separated patch make sense?

  No. What I want to implement is, read the regular table and transfer
  the contents into GPU's local memory for calculation, then receives
  its calculation result. The in-memory cache (also I'm working on) is
  supplemental stuff because disk access is much slower and row-oriented
  data structure is not suitable for SIMD style instructions.
 
 Is that actually performant?  Is it actually faster than processing the
 data directly?  The discussions that I've had with folks have cast a great
 deal of doubt in my mind about just how well that kind of quick turn-around
 to the GPU's memory actually works.
 
See above.

   This really strikes me as the wrong approach for an FDW
   join-pushdown API, which should be geared around giving the remote
   side an opportunity on a case-by-case basis to cost out joins using
   whatever methods it has available to implement them.  I've outlined
   above the reasons I don't agree with just making the entire
 planner/optimizer pluggable.
  
  I'm also inclined to have arguments that will provide enough
  information for extensions to determine the best path for them.
 
 For join push-down, I proposed above that we have an interface to the FDW
 which allows us to ask it how much each join of the tables which are on
 a given FDW's server would cost if 

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Christian Kruse
Hi,

On 25/02/14 19:28, Andres Freund wrote:
 I think all that's needed is to cut the first paragraphs that generally
 explain what huge pages are in some detail from the text and make sure
 the later paragraphs don't refer to the earlier ones.

Attached you will find a new version of the patch.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4dc1277..0006090 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1128,10 +1128,7 @@ include 'filename'
 The use of huge TLB pages results in smaller page tables and
 less CPU time spent on memory management, increasing performance. For
 more details, see
-ulink url=https://wiki.debian.org/Hugepages;the Debian wiki/ulink.
-Remember that you will need at least shared_buffers / huge page size +
-1 huge TLB pages. So for example for a system with 6GB shared buffers
-and a hugepage size of 2kb of you will need at least 3156 huge pages.
+xref linkend=linux-huge-tlb-pages.
/para
 
para
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bbb808f..f172526 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1307,6 +1307,55 @@ echo -1000  /proc/self/oom_score_adj
/para
/note
   /sect2
+
+  sect2 id=linux-huge-tlb-pages
+   titleLinux huge TLB pages/title
+
+   para
+To enable this feature in productnamePostgreSQL/productname you need a
+kernel with varnameCONFIG_HUGETLBFS=y/varname and
+varnameCONFIG_HUGETLB_PAGE=y/varname. You also have to tune the system
+setting varnamevm.nr_hugepages/varname. To calculate the number of
+necessary huge pages start productnamePostgreSQL/productname without
+huge pages enabled and check the varnameVmPeak/varname value from the
+proc filesystem:
+programlisting
+$ userinputhead -1 /path/to/data/directory/postmaster.pid/userinput
+4170
+$ userinputgrep ^VmPeak /proc/4170/status/userinput
+VmPeak:  6490428 kB
+/programlisting
+ literal6490428/literal / literal2048/literal
+ (varnamePAGE_SIZE/varname is literal2MB/literal in this case) are
+ roughly literal3169.154/literal huge pages, so you will need at
+ least literal3170/literal huge pages:
+programlisting
+$ userinputsysctl -w vm.nr_hugepages=3170/userinput
+/programlisting
+Sometimes the kernel is not able to allocate the desired number of huge
+pages, so it might be necessary to repeat that command or to reboot. Don't
+forget to add an entry to filename/etc/sysctl.conf/filename to persist
+this setting through reboots.
+   /para
+
+   para
+The default behavior for huge pages in
+productnamePostgreSQL/productname is to use them when possible and
+to fallback to normal pages when failing. To enforce the use of huge
+pages, you can set
+link linkend=guc-huge-tlb-pagesvarnamehuge_tlb_pages/varname/link
+to literalon/literal. Note that in this case
+productnamePostgreSQL/productname will fail to start if not enough huge
+pages are available.
+   /para
+
+   para
+For a detailed description of the productnameLinux/productname huge
+pages feature have a look
+at ulink url=https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt;https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt/ulink.
+   /para
+
+  /sect2
  /sect1
 
 


pgpNzko1BF91A.pgp
Description: PGP signature


Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak

2014-02-26 Thread Sergey Burladyan
Alex Hunsaker bada...@gmail.com writes:

 On Tue, Feb 25, 2014 at 6:56 AM, Sergey Burladyan eshkin...@gmail.com wrote:

  It looks like I found the problem, Perl use reference count and something 
  that
  is called Mortal for memory management.  As I understand it, mortal is 
  free
  after FREETMPS. Plperl call FREETMPS in plperl_call_perl_func() but after 
  it,
  plperl ask perl interpreter again for new mortal SV variables, for example, 
  in
  hek2cstr from plperl_sv_to_datum, and this new SV is newer freed.

 So I think hek2cstr is the only place we leak (its the only place I
 can see that allocates a mortal sv without being wrapped in
 ENTER/SAVETMPS/FREETMPS/LEAVE).

Yeah, I also try to fix only hek2cstr, but failed.

 Does the attached fix it for you?

Yes, your patch is fix it for me, thank you, Alex!



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unfortunate choice of short switch name in pgbench

2014-02-26 Thread Heikki Linnakangas

On 02/25/2014 11:32 PM, Tom Lane wrote:

Meh.  A progress-reporting feature has use when the tool is working
towards completion of a clearly defined task.  In the case of pgbench,
if you told it to run for -T 60 seconds rather than -T 10 seconds,
that's probably because you don't trust a 10-second average to be
sufficiently reproducible.  So I'm not real sure that reporting averages
over shorter intervals is all that useful; especially not if it takes
cycles out of pgbench, which itself is often a bottleneck.


It's not useful when doing rigorous benchmarking to publish results, but 
in quick testing of various hacks during development, getting 10-second 
averages is very useful. You quickly see how stable the short averages 
are, and you can just hit CTRL-C when you've seen enough, without having 
to decide the suitable test length before hand.


It's also useful to see how checkpoints or autovacuum affects the 
transaction rate.


That said, no objection to removing the -P shorthand.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Kouhei Kaigai
   If you're looking to just use GPU acceleration for improving
   individual queries, I would think that Robert's work around backend
   workers would be a more appropriate way to go, with the ability to
   move a working set of data from shared buffers and on-disk
   representation of a relation over to the GPU's memory, perform the
 operation, and then copy the results back.
  
  The approach is similar to the Robert's work except for GPU adoption,
  instead of multicore CPUs. So, I tried to review his work to apply the
  facilities on my extension also.
 
 Good, I'd be very curious to hear how that might solve the issue for you,
 instead of using hte CustomScan approach..
 
I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

   regular PG tables, just to point out one issue, can be locked on a
   row-by-row basis, and we know exactly where in shared buffers to go
   hunt down the rows.  How is that going to work here, if this is both
 a regular
   table and stored off in a GPU's memory across subsequent queries or
   even transactions?
  
  It shall be handled case-by-case basis, I think. If row-level lock
  is required over the table scan, custom-scan node shall return a tuple
  being located on the shared buffer, instead of the cached tuples. Of
  course, it is an option for custom-scan node to calculate qualifiers
  by GPU with cached data and returns tuples identified by ctid of the cached
 tuples.
  Anyway, it is not a significant problem.
 
 I think you're being a bit too hand-wavey here, but if we're talking about
 pre-scanning the data using PG before sending it to the GPU and then only
 performing a single statement on the GPU, we should be able to deal with
 it.
It's what I want to implement.

 I'm worried about your ideas to try and cache things on the GPU though,
 if you're not prepared to deal with locks happening in shared memory on
 the rows you've got cached out on the GPU, or hint bits, or the visibility
 map being updated, etc...
 
It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

  OK, I'll move the portion that will be needed commonly for other FDWs
  into the backend code.
 
 Alright- but realize that there may be objections there on the basis that
 the code/structures which you're exposing aren't, and will not be, stable.
 I'll have to go back and look at them myself, certainly, and their history.
 
I see, but it is a process during code getting merged.

  Yes. According to the previous discussion around postgres_fdw getting
  merged, all we can trust on the remote side are built-in data types,
  functions, operators or other stuffs only.
 
 Well, we're going to need to expand that a bit for aggregates, I'm afraid,
 but we should be able to define the API for those aggregates very tightly
 based on what PG does today and require that any FDW purporting to provides
 those aggregates do it the way PG does.  Note that this doesn't solve all
 the problems- we've got other issues with regard to pushing aggregates down
 into FDWs that need to be solved.
 
I see. It probably needs more detailed investigation.

  The custom-scan node is intended to perform on regular relations, not
  only foreign tables. It means a special feature (like GPU
  acceleration) can perform transparently for most of existing
  applications. Usually, it defines regular tables for their work on
  installation, not foreign tables. It is the biggest concern for me.
 
 The line between a foreign table and a local one is becoming blurred already,
 but still, if this is the goal then I really think the background worker
 is where you should be focused, not on this Custom Scan API.  Consider that,
 once we've got proper background workers, we're going to need new nodes
 which operate in parallel (or some other rejiggering of the nodes- I don't
 pretend to know exactly what Robert is thinking here, and I've apparently
 forgotten it if he's posted it
 somewhere) and those interfaces may drive changes which would impact the
 Custom Scan API- or worse, make us deprecate or regret having added it
 because now we'll need to break backwards compatibility to add in the
 parallel node capability to satisfy the more general non-GPU case.
 
The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.
Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan 

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Shigeru Hanada
2014-02-26 17:31 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 IIUC, his approach was integration of join-pushdown within FDW APIs,
 however, it does not mean the idea of remote-join is rejected.
 I believe it is still one of our killer feature if we can revise the
 implementation.

 Hanada-san, could you put the reason why your proposition was rejected
 before?

IIUC it was not rejected, just returned-with-feedback.  We could not
get consensus about how join-push-down works.  A duscussion point was
multiple paths for a joinrel, but it was not so serious point.  Here
is the tail of the thread.

http://www.postgresql.org/message-id/4f058241.2000...@enterprisedb.com

 Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

 Hmm, so you're saying that the FDW function needs to be able to return
 multiple paths for a single joinrel. Fair enough, and that's not
 specific to remote joins. Even a single-table foreign scan could be
 implemented differently depending on whether you prefer fast-start or
 cheapest total.


 ... or ordered vs unordered, etc.  Yeah, good point, we already got this
 wrong with the PlanForeignScan API.  Good thing we didn't promise that
 would be stable.


 This discussion withered down here...

 I think the advice to Shigeru-san is to work on the API. We didn't reach a
 consensus on what exactly it should look like, but at least you need to be
 able to return multiple paths for a single joinrel, and should look at
 fixing the PlanForeignScan API to allow that too.

And I've gave up for lack of time, IOW to finish more fundamental
portion of FDW API.

http://www.postgresql.org/message-id/4f39fc1a.7090...@gmail.com

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)

2014-02-26 Thread Kouhei Kaigai
   Thanks for the information, I will apply other patches also and
 start testing.
 
 
 When try to run the pgbench test, by default the cache-scan plan is not
 chosen because of more cost. So I increased the cpu_index_tuple_cost to
 a maximum value or by turning off index_scan, so that the plan can chose
 the cache_scan as the least cost.
 
It's expected. In case of index-scan is available, its cost is obviously
cheaper than cache-scan, even if it does not issue disk-i/o.

 The configuration parameters changed during the test are,
 
 shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB,
 checkpoint_segments - 255 checkpoint_timeout - 15 min,
 cpu_index_tuple_cost - 10 or enable_indexscan=off
 
 Test procedure:
 1. Initialize the database with pgbench with 75 scale factor.
 2. Create the triggers on pgbench_accounts 3. Use a select query to load
 all the data into cache.
 4. Run  a simple update pgbench test.
 
 Plan details of pgbench simple update queries:
 
 postgres=# explain update pgbench_accounts set abalance = abalance where
 aid = 10;
 QUERY PLAN
 --
 -
  Update on pgbench_accounts  (cost=0.43..18.44 rows=1 width=103)
-  Index Scan using pgbench_accounts_pkey on pgbench_accounts
 (cost=0.43..18.44 rows=1 width=103)
  Index Cond: (aid = 10)
  Planning time: 0.045 ms
 (4 rows)
 
 postgres=# explain select abalance from pgbench_accounts where aid =
 10;
  QUERY PLAN
 --
 --
  Custom Scan (cache scan) on pgbench_accounts  (cost=0.00..99899.99
 rows=1 width=4)
Filter: (aid = 10)
  Planning time: 0.042 ms
 (3 rows)
 
 I am observing a too much delay in performance results. The performance
 test script is attached in the mail.
 
I want you to compare two different cases between sequential scan but
part of buffers have to be loaded from storage and cache-only scan.
It probably takes a difference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Christian Kruse
Hi Peter,

after a night of sleep I'm still not able to swallow the pill. To be
honest I'm a little bit angry about this accusation.

I didn't mean to copy from the Debian wiki and after re-checking the
text again I'm still convinced that I didn't.

Of course the text SAYS something similar, but this is in the nature
of things. Structure, diction and focus are different. Also the
information transferred is different and gathered from various
articles, including the Debian wiki, the huge page docs of the kernel,
the Wikipedia and some old IBM and Oracle docs.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpIF57Iw0CqM.pgp
Description: PGP signature


Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-26 Thread Christian Kruse
Hi Robert,

On 25/02/14 12:37, Robert Haas wrote:
 Committed, after fixing the regression test outputs.  I also changed
 the new fields to be NULL rather than 0 when they are invalid, because
 that way applying age() to that column does something useful instead
 of something lame.

Hm, thought I had done that. However, thanks for committing and
fixing!

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpBD5IQLGElj.pgp
Description: PGP signature


Re: [HACKERS] GiST support for inet datatypes

2014-02-26 Thread Emre Hasegeli
2014-02-24 02:44, Andreas Karlsson andr...@proxel.se:

 Note: The patches do not apply anymore due to changes to
 src/backend/utils/adt/Makefile.

I will fix it on the next version.

 I see, thanks for the explanation. But I am still not very fond of how that
 code is written since I find it hard to verify the correctness of it, but
 have no better suggestions.

We can split the selectivity estimation patch from the GiST support, add
it to the next commit fest for more review. In the mean time, I can
improve it with join selectivity estimation. Testing with different datasets
would help the most to reveal how true the algorithm is.

 Do you think the new index is useful even if you use the basic geo_selfuncs?
 Or should we wait with committing the patches until all selfuncs are
 implemented?

My motivation was to be able to use the cidr datatype with exclusion
constraints. I think the index would also be useful without proper
selectivity estimation functions. Range types also use geo_selfuncs for
join selectivity estimation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

2014-02-26 Thread Pavel Golub
Hello, Robert.

You wrote:

RH On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski rukh.me...@yahoo.ca wrote:
 Sorry, I wanted to minimize the attention my message attracts.  I mostly 
 posted it to let people know I plan on working on this for 9.5 to avoid 
 duplicated effort.  I will post more documentation and my reasons for 
 wanting this feature in postgre later, if that's all right.

RH I've wanted this more than once.  I suspect it's a pretty hard project, 
though.

+1 from me. This is the exciting functionality. There was even a poll
in my blog year ago: 
http://pgolub.wordpress.com/2012/11/23/do-we-need-limit-clause-in-update-and-delete-statements-for-postgresql/

So the results were (for those who don't want check the post):
Yes, for functionality: 194 (61.4%)
No way! 78 (24.7%)
Do not care 44 (13.9%)

RH -- 
RH Robert Haas
RH EnterpriseDB: http://www.enterprisedb.com
RH The Enterprise PostgreSQL Company





-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Heikki Linnakangas

On 02/26/2014 10:35 AM, Christian Kruse wrote:

On 25/02/14 19:28, Andres Freund wrote:

I think all that's needed is to cut the first paragraphs that generally
explain what huge pages are in some detail from the text and make sure
the later paragraphs don't refer to the earlier ones.


Attached you will find a new version of the patch.


Thanks!


huge_tlb_pages (enum)

Enables/disables the use of huge TLB pages. Valid values are try (the 
default), on, and off.

At present, this feature is supported only on Linux. The setting is ignored 
on other systems.

The use of huge TLB pages results in smaller page tables and less CPU time 
spent on memory management, increasing performance. For more details, see 
Section 17.4.4.

With huge_tlb_pages set to try, the server will try to use huge pages, but 
fall back to using normal allocation if that fails. With on, failure to use 
huge pages will prevent the server from starting up. With off, huge pages will 
not be used.


That still says The setting is ignored on other systems. That's not 
quite true: as explained later in the section, if you set 
huge_tlb_pages=on and the platform doesn't support it, the server will 
refuse to start.



17.4.4. Linux huge TLB pages


This section looks good to me. I'm OK with the level of detail, although 
maybe just a sentence or two about what huge TLB pages are and what 
benefits they have would still be in order. How about adding something 
like this as the first sentence:


Using huge TLB pages reduces overhead when using large contiguous 
chunks of memory, like PostgreSQL does.



To enable this feature in PostgreSQL you need a kernel with CONFIG_HUGETLBFS=y 
and CONFIG_HUGETLB_PAGE=y. You also have to tune the system setting 
vm.nr_hugepages. To calculate the number of necessary huge pages start 
PostgreSQL without huge pages enabled and check the VmPeak value from the proc 
filesystem:

$ head -1 /path/to/data/directory/postmaster.pid
4170
$ grep ^VmPeak /proc/4170/status
VmPeak:  6490428 kB

6490428 / 2048 (PAGE_SIZE is 2MB in this case) are roughly 3169.154 huge pages, 
so you will need at least 3170 huge pages:

$ sysctl -w vm.nr_hugepages=3170


That's good advice, but perhaps s/calculate/estimate/. It's just an 
approximation, after all.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-02-26 Thread Andres Freund
Hi,

On 2014-02-26 07:32:45 +, Simon Riggs wrote:
  * This definitely should include isolationtester tests actually
performing concurrent ALTER TABLEs. All that's currently there is
tests that the locklevel isn't too high, but not that it actually works.
 
 There is no concurrent behaviour here, hence no code that would be
 exercised by concurrent tests.

Huh? There's most definitely new concurrent behaviour. Previously no
other backends could have a relation open (and locked) while it got
altered (which then sends out relcache invalidations). That's something
that should be tested.

  * Why does ChangeOwner need AEL?
 
 Ownership affects privileges, which includes SELECTs, hence AEL.

So?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Christian Kruse
Hi,

On 26/02/14 14:34, Heikki Linnakangas wrote:
 That still says The setting is ignored on other systems. That's not quite
 true: as explained later in the section, if you set huge_tlb_pages=on and
 the platform doesn't support it, the server will refuse to start.

I added a sentence about it.

 Using huge TLB pages reduces overhead when using large contiguous chunks of
 memory, like PostgreSQL does.

Sentence added.

 That's good advice, but perhaps s/calculate/estimate/. It's just an
 approximation, after all.

Fixed.

New patch version is attached.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4dc1277..c5c2d8b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1120,18 +1120,17 @@ include 'filename'
/para
 
para
-At present, this feature is supported only on Linux. The setting
-is ignored on other systems.
+At present, this feature is supported only on Linux. The setting is
+ignored on other systems when set to literaltry/literal.
+productnamePostgreSQL/productname will
+refuse to start when set to literalon/literal.
/para
 
para
 The use of huge TLB pages results in smaller page tables and
 less CPU time spent on memory management, increasing performance. For
 more details, see
-ulink url=https://wiki.debian.org/Hugepages;the Debian wiki/ulink.
-Remember that you will need at least shared_buffers / huge page size +
-1 huge TLB pages. So for example for a system with 6GB shared buffers
-and a hugepage size of 2kb of you will need at least 3156 huge pages.
+xref linkend=linux-huge-tlb-pages.
/para
 
para
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bbb808f..5f9fa61 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1307,6 +1307,57 @@ echo -1000  /proc/self/oom_score_adj
/para
/note
   /sect2
+
+  sect2 id=linux-huge-tlb-pages
+   titleLinux huge TLB pages/title
+
+   para
+Using huge TLB pages reduces overhead when using large contiguous chunks
+of memory, like PostgreSQL does. To enable this feature
+in productnamePostgreSQL/productname you need a kernel
+with varnameCONFIG_HUGETLBFS=y/varname and
+varnameCONFIG_HUGETLB_PAGE=y/varname. You also have to tune the system
+setting varnamevm.nr_hugepages/varname. To estimate the number of
+necessary huge pages start productnamePostgreSQL/productname without
+huge pages enabled and check the varnameVmPeak/varname value from the
+proc filesystem:
+programlisting
+$ userinputhead -1 /path/to/data/directory/postmaster.pid/userinput
+4170
+$ userinputgrep ^VmPeak /proc/4170/status/userinput
+VmPeak:  6490428 kB
+/programlisting
+ literal6490428/literal / literal2048/literal
+ (varnamePAGE_SIZE/varname is literal2MB/literal in this case) are
+ roughly literal3169.154/literal huge pages, so you will need at
+ least literal3170/literal huge pages:
+programlisting
+$ userinputsysctl -w vm.nr_hugepages=3170/userinput
+/programlisting
+Sometimes the kernel is not able to allocate the desired number of huge
+pages, so it might be necessary to repeat that command or to reboot. Don't
+forget to add an entry to filename/etc/sysctl.conf/filename to persist
+this setting through reboots.
+   /para
+
+   para
+The default behavior for huge pages in
+productnamePostgreSQL/productname is to use them when possible and
+to fallback to normal pages when failing. To enforce the use of huge
+pages, you can set
+link linkend=guc-huge-tlb-pagesvarnamehuge_tlb_pages/varname/link
+to literalon/literal. Note that in this case
+productnamePostgreSQL/productname will fail to start if not enough huge
+pages are available.
+   /para
+
+   para
+For a detailed description of the productnameLinux/productname huge
+pages feature have a look
+at ulink url=https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt;https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt/ulink.
+   /para
+
+  /sect2
  /sect1
 
 


pgpeMqXsSJV2_.pgp
Description: PGP signature


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure
On Tue, Feb 25, 2014 at 10:07 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 02/26/2014 06:21 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.

 Sure.

 Please also highlight that any change will require a full table rewrite
 with an exclusive lock, so data type choices on larger tables may be
 hard to change later.

Yeah.  Good idea.  Also gonna make a table of what happens when you
cast from A to B (via text, json, jsonb, hstore).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uninterruptable regexp_replace in 9.3.1 ?

2014-02-26 Thread Sandro Santilli
On Fri, Feb 21, 2014 at 08:17:36PM -0500, Tom Lane wrote:
 I wrote:
  Craig Ringer cr...@2ndquadrant.com writes:
  So I'd like to confirm that this issue doesn't affect 9.1.
 
  It doesn't.  I suspect it has something to do with 173e29aa5 or one
  of the nearby commits in backend/regex/.
 
 Indeed, git bisect fingers that commit as introducing the problem.

Is there anything I can do for this to not fade to oblivion ?
Will a mail to pgsql-bugs help ?

--strk;

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure
On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 02/25/2014 08:54 PM, Josh Berkus wrote:
 That's called a straw man argument, Robert.
 Me: We should recommend that people use jsonb unless they have a
 specific reason for using json.
 We could also make the opposite argument - people use json unless they
 have a specific reason for using jsonb.

 btw, there is one more thing about JSON which I recently learned - a lot of
 JavaScript people actually expect the JSON binary form to retain field order

 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.

Yes: This was the agreement that was struck and is the main reason why
there are two json types, not one.  JSON does not guarantee field
ordering as I read the spec and for the binary form ordering is not
maintained as a concession to using the hstore implementation.

You can always use the standard text json type for storage and cast
into the index for searching; what you give up there is some
performance and the ability to manipulate the json over the hstore
API.  I think that will have to do for now and field ordering for
hstore/jsonb can be reserved as a research item.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-02-26 Thread Simon Riggs
On 26 February 2014 13:38, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2014-02-26 07:32:45 +, Simon Riggs wrote:
  * This definitely should include isolationtester tests actually
performing concurrent ALTER TABLEs. All that's currently there is
tests that the locklevel isn't too high, but not that it actually works.

 There is no concurrent behaviour here, hence no code that would be
 exercised by concurrent tests.

 Huh? There's most definitely new concurrent behaviour. Previously no
 other backends could have a relation open (and locked) while it got
 altered (which then sends out relcache invalidations). That's something
 that should be tested.

It has been. High volume concurrent testing has been performed, per
Tom's original discussion upthread, but that's not part of the test
suite.
For other tests I have no guide as to how to write a set of automated
regression tests. Anything could cause a failure, so I'd need to write
an infinite set of tests to prove there is no bug *somewhere*. How
many tests are required? 0, 1, 3, 30?

  * Why does ChangeOwner need AEL?

 Ownership affects privileges, which includes SELECTs, hence AEL.

 So?

That reply could be added to any post. Please explain your concern.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Stephen Frost
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
 IIUC, his approach was integration of join-pushdown within FDW APIs,
 however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical.  What are you joining remotely if not
two foreign tables?  With regard to the GPU approach, if that model
works whereby the normal PG tuples are read off disk, fed over to the
GPU, processed, then returned back to the user through PG, then I
wouldn't consider it really a 'remote' join but rather simply a new
execution node inside of PG which is planned and costed just like the
others.  We've been over the discussion already about trying to make
that a pluggable system but the, very reasonable, push-back on that has
been if it's really possible and really makes sense to be pluggable.  It
certainly doesn't *have* to be- PostgreSQL is written in C, as we all
know, and plenty of C code talks to GPUs and shuffles memory around- and
that's almost exactly what Robert is working on supporting with regular
CPUs and PG backends already.

In many ways, trying to conflate this idea of using-GPUs-to-do-work with
the idea of remote-FDW-joins has really disillusioned me with regard to
the CustomScan approach.

  Then perhaps they should be exposed more directly?  I can understand
  generally useful functionality being exposed in a way that anyone can use
  it, but we need to avoid interfaces which can't be stable due to normal
  / ongoing changes to the backend code.
  
 The functions my patches want to expose are:
  - get_restriction_qual_cost()
  - fix_expr_common()

I'll try and find time to go look at these in more detail later this
week.  I have reservations about exposing the current estimates on costs
as we may want to adjust them in the future- but such adjustments may
need to be made in balance with other changes throughout the system and
an external module which depends on one result from the qual costing
might end up having problems with the costing changes because the
extension author wasn't aware of the other changes happening in other
areas of the costing.

I'm talking about this from a beyond-just-the-GUCs point of view, I
realize that the extension author could go look at the GUC settings, but
it's entirely reasonable to believe we'll make changes to the default
GUC settings along with how they're used in the future.

 And, the functions my patches newly want are:
  - bms_to_string()
  - bms_from_string()

Offhand, these look fine, if there's really an external use for them.
Will try to look at them in more detail later.

  That's fine, if we can get data to and from those co-processors efficiently
  enough that it's worth doing so.  If moving the data to the GPU's memory
  will take longer than running the actual aggregation, then it doesn't make
  any sense for regular tables because then we'd have to cache the data in
  the GPU's memory in some way across multiple queries, which isn't something
  we're set up to do.
  
 When I made a prototype implementation on top of FDW, using CUDA, it enabled
 to run sequential scan 10 times faster than SeqScan on regular tables, if
 qualifiers are enough complex.
 Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
 mode using hardware DMA. It allows to hide the cost of data transfer by
 pipelining, if here is enough number of records to be transferred.

That sounds very interesting and certainly figuring out the costing to
support that model will be tricky.  Also, shuffling the data around in
that way will also be interesting.  It strikes me that it'll be made
more difficult if we're trying to do it through the limitations of a
pre-defined API between the core code and an extension.

 Also, the recent trend of semiconductor device is GPU integration with CPU,
 that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
 Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
 the data to be calculated. This trend is dominated by physical law because
 of energy consumption by semiconductor. So, I'm optimistic for my idea.

And this just makes me wonder why the focus isn't on the background
worker approach instead of trying to do this all in an extension.

 The usage was found by the contrib module that wants to call static
 functions, or feature to translate existing data structure to/from
 cstring. But, anyway, does separated patch make sense?

I haven't had a chance to go back and look into the functions in detail,
but offhand I'd say the bms ones are probably fine while the others
would need more research as to if they make sense to expose to an
extension.

 Hmm... It seems to me we should follow the existing manner to construct
 join path, rather than special handling. Even if a query contains three or
 more foreign tables managed by same server, it shall be consolidated into
 one remote join as long as its cost is less than 

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-02-26 Thread Andres Freund
On 2014-02-26 15:15:00 +, Simon Riggs wrote:
 On 26 February 2014 13:38, Andres Freund and...@2ndquadrant.com wrote:
  Hi,
 
  On 2014-02-26 07:32:45 +, Simon Riggs wrote:
   * This definitely should include isolationtester tests actually
 performing concurrent ALTER TABLEs. All that's currently there is
 tests that the locklevel isn't too high, but not that it actually 
   works.
 
  There is no concurrent behaviour here, hence no code that would be
  exercised by concurrent tests.
 
  Huh? There's most definitely new concurrent behaviour. Previously no
  other backends could have a relation open (and locked) while it got
  altered (which then sends out relcache invalidations). That's something
  that should be tested.
 
 It has been. High volume concurrent testing has been performed, per
 Tom's original discussion upthread, but that's not part of the test
 suite.

Yea, that's not what I am looking for.

 For other tests I have no guide as to how to write a set of automated
 regression tests. Anything could cause a failure, so I'd need to write
 an infinite set of tests to prove there is no bug *somewhere*. How
 many tests are required? 0, 1, 3, 30?

I think some isolationtester tests for the most important changes in
lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
...  while a query is in progress in a nother session.

   * Why does ChangeOwner need AEL?
 
  Ownership affects privileges, which includes SELECTs, hence AEL.
 
  So?
 
 That reply could be added to any post. Please explain your concern.

I don't understand why that means it needs an AEL. After all,
e.g. changes in table inheritance do *not* require an AEL. I think it's
perfectly ok to not go for the minimally required locklevel for all
subcommands, but then it should be commented as such and not with
change visible to SELECT where other operations that do so as well
require lower locklevels.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 02:17 AM, Christophe Pettus wrote:

On Feb 25, 2014, at 1:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:


It is not in any specs, but nevertheless all major imlementations do it and
some code depends on it.

I have no doubt that some code depends on it, but all major implementations 
is too strong a statement.  BSON, in particular, does not have stable field order.





Not only is it not in any specs, it's counter to the spec I have been 
following https://www.ietf.org/rfc/rfc4627.txt, which quite 
categorically states that an object is an UNORDERED collection. Any 
application which relies on the ordering of object fields being 
preserved is broken IMNSHO, and I would not feel the least guilt about 
exposing their breakage.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ruleutils.c fails to deparse whole-row references

2014-02-26 Thread Alvaro Herrera
Hi,

While testing event triggers support for CREATE RULE, I noticed that
existing code does not deparse whole-row references correctly:

postgres=# create table f (a int);
CREATE TABLE
postgres=# create table g (other f);
CREATE TABLE
postgres=# create rule f as on insert to f do also (insert into g values (new));
CREATE RULE
postgres=# \d f
 Tabla «public.f»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | 
Reglas:
f AS
ON INSERT TO f DO  INSERT INTO g (other)
  VALUES (new.*)


Note that my rule had values (new) but the deparsed version uses
VALUES (new.*) which doesn't work on input: pg_dump will fail with
this output:

$ pg_dump postgres -t f -t g | LC_ALL=C PGOPTIONS=-c lc_messages=C psql 
postgres2
SET
SET
SET
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
ERROR:  column other is of type f but expression is of type integer
LINE 3:   VALUES (new.*);
  ^
HINT:  You will need to rewrite or cast the expression.


My sample session with event triggers is a neat example of that
functionality at work BTW (or so I think anyway):

alvherre=# CREATE  RULE F AS ON INSERT TO public.f  DO also (insert into g 
values (new); insert into h values (new); insert into i values (new));
NOTICE:  Trigger has been executed. ddl_command_end -- CREATE RULE  
NOTICE:  JSON blob: {
actions: [
INSERT INTO public.g (duh) VALUES (new.*), 
INSERT INTO public.h (duh) VALUES (new.*), 
INSERT INTO public.i (duh) VALUES (new.*)
], 
event: INSERT, 
fmt: CREATE %{or_replace}s RULE %{identity}I AS ON %{event}s TO 
%{table}D %{where_clause}s DO %{instead}s (%{actions:; }s), 
identity: F, 
instead: ALSO, 
or_replace: , 
table: {
objname: f, 
schemaname: public
}, 
where_clause: {
clause: null, 
fmt: WHERE %{clause}s, 
present: false
}
}
NOTICE:  expanded: CREATE  RULE F AS ON INSERT TO public.f  DO ALSO (INSERT 
INTO public.g (duh) VALUES (new.*); INSERT INTO public.h (duh) VALUES (new.*); 
INSERT INTO public.i (duh) VALUES (new.*))
CREATE RULE
alvherre=# \d f
 Tabla «public.f»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | 
Reglas:
F AS
ON INSERT TO f DO ( INSERT INTO g (duh)
  VALUES (new.*);
 INSERT INTO h (duh)
  VALUES (new.*);
 INSERT INTO i (duh)
  VALUES (new.*);
)


I'm switching to the multixact bug now, but I wanted to report this
before I forgot it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Stephen Frost
* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
 I (plan to) use custom-scan of course. Once a relation is referenced
 and optimizer decided GPU acceleration is cheaper, associated custom-
 scan node read the data from underlying relation (or in-memory cache
 if exists) then move to the shared memory buffer to deliver GPU
 management background worker that launches asynchronous DMA one by one.
 After that, custom-scan node receives filtered records via shared-
 memory buffer, so it can construct tuples to be returned to the upper
 node.

Alright- but have you discussed this with Robert?  We're going to be
whacking things around for parallel support with new nodes and more
built-in helper functionality for doing this work and I'm not anxious to
have CustomScan end up being a legacy interface that we're required to
pull forward because we accepted it before things had settled.

  I'm worried about your ideas to try and cache things on the GPU though,
  if you're not prepared to deal with locks happening in shared memory on
  the rows you've got cached out on the GPU, or hint bits, or the visibility
  map being updated, etc...
  
 It does not remain any state/information on the GPU side. Things related
 to PG internal stuff is job of CPU.

Right, good, I'm glad to hear that this approach is for doing things at
only a individual statement level and it's good to know that it can be
performant at that level now.

  Well, we're going to need to expand that a bit for aggregates, I'm afraid,
  but we should be able to define the API for those aggregates very tightly
  based on what PG does today and require that any FDW purporting to provides
  those aggregates do it the way PG does.  Note that this doesn't solve all
  the problems- we've got other issues with regard to pushing aggregates down
  into FDWs that need to be solved.
  
 I see. It probably needs more detailed investigation.

These issues will hopefully not be a problem (or at least, one that can
be worked around) for non-FDW implementations which are part of core and
implemented in a similar way to the existing aggregates..  Where the
scan node could continue to be a simple SeqScan as it is today.

 The custom-scan API is thin abstraction towards the plan node interface,
 not tightly convinced with a particular use case, like GPU, remote-join
 and so on. So, I'm quite optimistic for the future maintainability.

I don't see how you can be when there hasn't been any discussion that
I've seen about how parallel query execution is going to change things
for us.

 Also, please remind the discussion at the last developer meeting.
 The purpose of custom-scan (we didn't name it at that time) is to avoid
 unnecessary project branch for people who want to implement their own
 special feature but no facilities to enhance optimizer/executor are
 supported.
 Even though we have in-core parallel execution feature by CPU, it also
 makes sense to provide some unique implementation that may be suitable
 for a specific region.

The issue here is that we're going to be expected to maintain an
interface once we provide it and so that isn't something we should be
doing lightly.  Particularly when it's as involved as this kind of
change is with what's going on in the backend where we are nearly 100%
sure to be changing things in the next release or two.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Alvaro Herrera

There's one thing that rubs me the wrong way about all this
functionality, which is that we've named it huge TLB pages.  That is
wrong -- the TLB pages are not huge.  In fact, as far as I understand,
the TLB doesn't have pages at all.  It's the pages that are huge, but
those pages are not TLB pages, they are just memory pages.

I think we have named it this way only because Linux for some reason
named the mmap() flag MAP_HUGETLB for some reason.  The TLB is not huge
either (in fact you can't alter the size of the TLB at all; it's a
hardware thing.) I think this flag means use the TLB entries reserved
for huge pages for the memory I'm requesting.

Since we haven't released any of this, should we discuss renaming it to
just huge pages?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.7

2014-02-26 Thread Andres Freund
On 2014-02-24 17:06:53 -0500, Robert Haas wrote:
 -   heap_page_prune_opt(scan-rs_rd, buffer, RecentGlobalXmin);
 +   if (IsSystemRelation(scan-rs_rd)
 +   || RelationIsAccessibleInLogicalDecoding(scan-rs_rd))
 +   heap_page_prune_opt(scan-rs_rd, buffer, RecentGlobalXmin);
 +   else
 +   heap_page_prune_opt(scan-rs_rd, buffer, 
 RecentGlobalDataXmin);
 
 Instead of changing the callers of heap_page_prune_opt() in this way,
 I think it might be better to change heap_page_prune_opt() to take
 only the first two of its current three parameters; everybody's just
 passing RecentGlobalXmin right now anyway.

I've changed stuff this way, and it indeed looks better.

I am wondering about the related situation of GetOldestXmin()
callers. There's a fair bit of duplicated logic in the callers, before
but especially after this patchset. What about adding 'Relation rel'
parameter instead of `allDbs' and `systable'? That keeps the logic
centralized and there's been a fair amount of talk about vacuum
optimizations that could also use it.
It's a bit sad that that requires including rel.h from procarray.h...

What do you think? Isolated patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From b23fe717c699ad5e7cac217c3a5725b57c722ff2 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 26 Feb 2014 18:23:55 +0100
Subject: [PATCH] fixup! wal_decoding: Introduce logical changeset extraction.

Centralize knowledge from GetOldestXmin() callers into GetOldestXmin()
itself.
---
 src/backend/access/transam/xlog.c |  4 +--
 src/backend/catalog/index.c   | 12 +
 src/backend/commands/analyze.c|  4 +--
 src/backend/commands/cluster.c|  4 +--
 src/backend/commands/vacuum.c |  9 +++
 src/backend/commands/vacuumlazy.c |  6 ++---
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/storage/ipc/procarray.c   | 50 +++
 src/include/commands/vacuum.h |  4 +--
 src/include/storage/procarray.h   |  3 ++-
 10 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b6b2cd4..9f48fa5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8619,7 +8619,7 @@ CreateCheckPoint(int flags)
 	 * StartupSUBTRANS hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(true, false, true));
+		TruncateSUBTRANS(GetOldestXmin(NULL, false));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -8997,7 +8997,7 @@ CreateRestartPoint(int flags)
 	 * this because StartupSUBTRANS hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(true, false, true));
+		TruncateSUBTRANS(GetOldestXmin(NULL, false));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8c1803e..877d767 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2154,19 +2154,9 @@ IndexBuildHeapScan(Relation heapRelation,
 	}
 	else
 	{
-		/*
-		 * We can ignore a) pegged xmins b) shared relations if we don't scan
-		 * something acting as a catalog.
-		 */
-		bool include_systables =
-			IsCatalogRelation(heapRelation) ||
-			RelationIsAccessibleInLogicalDecoding(heapRelation);
-
 		snapshot = SnapshotAny;
 		/* okay to ignore lazy VACUUMs here */
-		OldestXmin = GetOldestXmin(heapRelation-rd_rel-relisshared,
-   true,
-   include_systables);
+		OldestXmin = GetOldestXmin(heapRelation, true);
 	}
 
 	scan = heap_beginscan_strat(heapRelation,	/* relation */
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fe569f5..a04adea 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1082,9 +1082,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	totalblocks = RelationGetNumberOfBlocks(onerel);
 
 	/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
-	OldestXmin = GetOldestXmin(onerel-rd_rel-relisshared, true,
-			   IsCatalogRelation(onerel) ||
-			   RelationIsAccessibleInLogicalDecoding(onerel));
+	OldestXmin = GetOldestXmin(onerel, true);
 
 	/* Prepare for sampling block numbers */
 	BlockSampler_Init(bs, totalblocks, targrows);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index fc5c013..b6b40e7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -850,9 +850,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 * Since we're going to rewrite the whole table anyway, there's no reason
 	 * not to be aggressive about this.
 	 */
-	vacuum_set_xid_limits(0, 0, 

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Josh Berkus
On 02/26/2014 07:02 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.
 
 Yes: This was the agreement that was struck and is the main reason why
 there are two json types, not one.  JSON does not guarantee field
 ordering as I read the spec and for the binary form ordering is not
 maintained as a concession to using the hstore implementation.

Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field
ordering.  So users who are familiar with JSONish data *storage* should
be aware that field ordering is not preserved.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Josh Berkus
On 02/25/2014 08:07 PM, Craig Ringer wrote:
 On 02/26/2014 06:21 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.

 Sure.
 
 Please also highlight that any change will require a full table rewrite
 with an exclusive lock, so data type choices on larger tables may be
 hard to change later.

Oh, point.  I'll add that text if Merlin doesn't.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Heikki Linnakangas

On 02/26/2014 06:13 PM, Alvaro Herrera wrote:


There's one thing that rubs me the wrong way about all this
functionality, which is that we've named it huge TLB pages.  That is
wrong -- the TLB pages are not huge.  In fact, as far as I understand,
the TLB doesn't have pages at all.  It's the pages that are huge, but
those pages are not TLB pages, they are just memory pages.

I think we have named it this way only because Linux for some reason
named the mmap() flag MAP_HUGETLB for some reason.  The TLB is not huge
either (in fact you can't alter the size of the TLB at all; it's a
hardware thing.) I think this flag means use the TLB entries reserved
for huge pages for the memory I'm requesting.

Since we haven't released any of this, should we discuss renaming it to
just huge pages?


Linux calls it huge tlb pages in many places, not just MAP_HUGETLB. 
Like in CONFIG_HUGETLB_PAGES and hugetlbfs. I agree it's a bit weird. 
Linux also calls it just huge pages in many other places, like in 
/proc/meminfo output.


FreeBSD calls them superpages and Windows calls them large pages. 
Yeah, it would seem better to call them just huge pages, so that it's 
more reminiscent of those names, if we ever implement support for 
super/huge/large pages on other platforms.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure
On Wed, Feb 26, 2014 at 11:41 AM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 07:02 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.

 Yes: This was the agreement that was struck and is the main reason why
 there are two json types, not one.  JSON does not guarantee field
 ordering as I read the spec and for the binary form ordering is not
 maintained as a concession to using the hstore implementation.

 Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field
 ordering.  So users who are familiar with JSONish data *storage* should
 be aware that field ordering is not preserved.

right (although I'm not sure what wasn't true there).  I think the
status quo is fine; If you have to have the document precisely
preserved for whatever reason you can do that -- you just have to be
prepared to give up some things.  As noted in the other thread
serialization is more interesting but that also works fine.  The
breakdown in terms of usage between json/jsonb to me is very clear
(json will handle serialization/deserializaton heavy patterns and a
few edge cases for storage).   The split between json and jsonb in
hindsight made a lot of sense.

What is not going to be so clear for users (particularly without good
supporting documentation) is how things break down in terms of usage
between hstore and jsonb.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Josh Berkus
On 02/26/2014 09:57 AM, Merlin Moncure wrote:
 On Wed, Feb 26, 2014 at 11:41 AM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 07:02 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com 
 wrote:
 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.

 Yes: This was the agreement that was struck and is the main reason why
 there are two json types, not one.  JSON does not guarantee field
 ordering as I read the spec and for the binary form ordering is not
 maintained as a concession to using the hstore implementation.

 Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field
 ordering.  So users who are familiar with JSONish data *storage* should
 be aware that field ordering is not preserved.
 
 right (although I'm not sure what wasn't true there).  I think the

Sorry, I was referring to Hannu's statement that all major
implementations preserve order, which simply isn't true.

 status quo is fine; If you have to have the document precisely
 preserved for whatever reason you can do that -- you just have to be
 prepared to give up some things.  As noted in the other thread
 serialization is more interesting but that also works fine.  The
 breakdown in terms of usage between json/jsonb to me is very clear
 (json will handle serialization/deserializaton heavy patterns and a
 few edge cases for storage).   The split between json and jsonb in
 hindsight made a lot of sense.
 
 What is not going to be so clear for users (particularly without good
 supporting documentation) is how things break down in terms of usage
 between hstore and jsonb.

Realistically?  Once we get done with mapping the indexes and operators,
users who are used to Hstore1 use Hstore2, and everyone else uses jsonb.
 jsonb is nothing other than a standardized syntax interface to hstore2,
and most users will choose the syntax similar to what they already know
over learning new stuff.

A real, full comparison chart would include text, json, jsonb and
hstore, I guess.  Although I'm wondering if that's way too complex for
the main docs.  Seems like more of a wiki item.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Function sugnature with default parameter

2014-02-26 Thread salah jubeh
Hello,

I find default values confusing when a function is overloaded, below is an 
example. 


CREATE OR REPLACE FUNCTION default_test (a INT DEFAULT 1, b INT DEFAULT 1, C 
INT DEFAULT 1) RETURNS INT AS
$$
    BEGIN
        RETURN a+b+c;
    END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION default_test (a INT DEFAULT 1, b INT DEFAULT 1) 
RETURNS INT AS
$$
    BEGIN
        RETURN a+b;
    END;
$$
LANGUAGE 'plpgsql';


-- this will fail 

--SELECT default_test(1,3);
--SELECT default_test(1);

test=# \df default_test 
 List of functions
 Schema | Name | Result data type |  Argument data 
types  |  Type  
+--+--+---+
 public | default_test | integer  | a integer DEFAULT 1, b integer 
DEFAULT 1  | normal
 public | default_test | integer  | a integer DEFAULT 1, b integer 
DEFAULT 1, c integer DEFAULT 1 | normal
(2 rows)

I think, there is a difference between optional parameters and default 
parameter values. So, my suggestion would be something like this.

SELECT default_test(1,3, DEFAULT); -- match function number 1

SELECT default_test(1,3); -- match the function number 2 

SELECT default_test(1); -- ERROR
Regards

Re: [HACKERS] Changeset Extraction v7.7

2014-02-26 Thread Alvaro Herrera
Andres Freund escribió:

 I am wondering about the related situation of GetOldestXmin()
 callers. There's a fair bit of duplicated logic in the callers, before
 but especially after this patchset. What about adding 'Relation rel'
 parameter instead of `allDbs' and `systable'? That keeps the logic
 centralized and there's been a fair amount of talk about vacuum
 optimizations that could also use it.
 It's a bit sad that that requires including rel.h from procarray.h...

relcache.h, not rel.h.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function sugnature with default parameter

2014-02-26 Thread David Johnston
salah jubeh wrote
 Hello,
 
 I find default values confusing when a function is overloaded, below is an
 example. 
 
 
 CREATE OR REPLACE FUNCTION default_test (a INT DEFAULT 1, b INT DEFAULT 1,
 C INT DEFAULT 1) RETURNS INT AS
 $$
     BEGIN
         RETURN a+b+c;
     END;
 $$
 LANGUAGE 'plpgsql';

Provide a real use-case for this need and maybe someone will consider it
worthwhile to implement.  

In the meantime use VARIADIC or define only the longest-default signature
and provide reasonable default values for all optional arguments.  In this
case the defaults should be zero, not one.

Or don't make any of the arguments DEFAULT and provide as many overloads as
you use.  The whole point of DEFAULT was to avoid having to do this for
simple cases - you are not compelled to use default values if your use-case
is too complex for them.

Or just avoid overloading and use different names that describe better what
the different versions accomplish.

Specifying DEFAULT in the function call seems to defeat the point.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Function-sugnature-with-default-parameter-tp5793737p5793739.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.7

2014-02-26 Thread Andres Freund
On 2014-02-26 15:30:55 -0300, Alvaro Herrera wrote:
 Andres Freund escribió:
 
  I am wondering about the related situation of GetOldestXmin()
  callers. There's a fair bit of duplicated logic in the callers, before
  but especially after this patchset. What about adding 'Relation rel'
  parameter instead of `allDbs' and `systable'? That keeps the logic
  centralized and there's been a fair amount of talk about vacuum
  optimizations that could also use it.
  It's a bit sad that that requires including rel.h from procarray.h...
 
 relcache.h, not rel.h.

RelationData is declared in rel.h, not relcache.h, no?

Alternatively we could just forward declare it in the header...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Simplified VALUES parameters

2014-02-26 Thread Leon Smith
Hi,  I'm the maintainer and a primary author of a postgresql client library
for Haskell,  called postgresql-simple,  and I recently investigated
improving support for VALUES expressions in this library.  As a result, I'd
like to suggest two changes to postgresql:

1.   Allow type specifications inside AS clauses,  for example

(VALUES (1,'hello'),(2,'world')) AS update(x int, y text)

2.  Have an explicit syntax for representing VALUES expressions which
contain no rows,  such as VALUES ().   (although the precise syntax isn't
important to me.)

My claim is that these changes would make it simpler for client libraries
to properly support parameterized VALUES expressions.  If you care,  I've
included a postscript including a brief background,  and a link to my
analysis and motivations.

Best,
Leon


P.S.

https://github.com/lpsmith/postgresql-simple/issues/61

Not entirely unlike many other client libraries, such as psycopg2,
 postgresql generates queries
by expanding values of particular Haskell types into fragments of SQL
syntax.   So for example,  you can currently write:

executeMany conn [sql|

 UPDATE tbl SET tbl.y = upd.y
   FROM (VALUES (?,?)) AS upd(x,y)

  WHERE tbl.x = upd.x

  |] [(1,hello),(2,world)]


Which will issue the query:

UPDATE tbl SET tbl.y = upd.y

  FROM (VALUES (1,'hello'),(2,'world')) AS upd(x,y)
 WHERE tbl.x = upd.x

The issue however is that postgresql-simple cannot currently parameterize
more complex queries that have multiple VALUES expressions,  or a VALUES
expression alongside other parameters,  as might occur with a Writable CTE
or complex query.

Also, when presented with a empty list of arguments,  executeMany does not
issue a query at all and simply returns 0,  which is (usually?) the right
thing to do given it's intended use cases,  but is not the right thing to
do in more general settings.

So,  what I'd like to do is to be able to write something like:


execute conn [sql|

 UPDATE tbl SET tbl.y = upd.y
   FROM ? AS upd(x,y)

  WHERE tbl.x = upd.x
AND tbl.z = ?

  |] ( Values [(1,hello),(2,world)], False )


and issue a similar query.   However, the problems with this approach is
specifying the postgresql types and handling the zero-row case properly.


Re: [HACKERS] Function sugnature with default parameter

2014-02-26 Thread Josh Berkus
On 02/26/2014 10:15 AM, salah jubeh wrote:
 I think, there is a difference between optional parameters and default 
 parameter values. So, my suggestion would be something like this.

 SELECT default_test(1,3, DEFAULT); -- match function number 1
 
 SELECT default_test(1,3); -- match the function number 2 
 
 SELECT default_test(1); -- ERROR
 Regards

This would break at least 4 major applications which I personally have
worked on, and the benefit to users is unclear at best.

One of the main reasons to *have* default parameters is to allow adding
new parameters to existing functions without breaking old application
code.  So, -1 from me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplified VALUES parameters

2014-02-26 Thread Josh Berkus
On 02/26/2014 10:47 AM, Leon Smith wrote:
 Hi,  I'm the maintainer and a primary author of a postgresql client library
 for Haskell,  called postgresql-simple,  and I recently investigated
 improving support for VALUES expressions in this library.  As a result, I'd
 like to suggest two changes to postgresql:

And thank you for writing that driver!

I have no opinion about your request for VALUES() stuff, though.  It
looks fairly complex as far as grammar and libpq is concerned.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplified VALUES parameters

2014-02-26 Thread Leon Smith
On Wed, Feb 26, 2014 at 1:54 PM, Josh Berkus j...@agliodbs.com wrote:

 And thank you for writing that driver!


You are welcome!

I have no opinion about your request for VALUES() stuff, though.  It
 looks fairly complex as far as grammar and libpq is concerned.


Actually,  my suggestions wouldn't necessarily impact libpq at all.   For
better and worse,  postgresql-simple does not currently support
protocol-level parameters at all.   While it's clear to me that I do
eventually need to work on supporting protocol-level parameters and support
for the binary formats,  it's also become clear to me since I first wrote
it that protocol-level parameters are not a total replacement either,  and
that postgresql-simple will still need to support direct parameter
expansion in some cases.   (e.g. for values parameters,  for identifier
parameters (which aren't yet supported due to the need to drop support for
libpq 8.4),  etc.)

Best,
Leon


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Stephen Frost
Christian,

Thanks for working on all of this and dealing with the requests for
updates and changes, as well as for dealing very professionally with an
inappropriate and incorrect remark.  Unfortunately, mailing lists can
make communication difficult and someone's knee-jerk reaction (not
referring to your reaction here) can end up causing much frustration.

Remind me when we're at a conference somewhere and I'll gladly buy you a
beer (or whatever your choice is).  Seriously, thanks for working on the
'huge pages' changes and documentation- it's often a thankless job and
clearly one which can be extremely frustrating.

Thanks again,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Function sugnature with default parameter

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 01:51 PM, Josh Berkus wrote:

On 02/26/2014 10:15 AM, salah jubeh wrote:

I think, there is a difference between optional parameters and default 
parameter values. So, my suggestion would be something like this.
SELECT default_test(1,3, DEFAULT); -- match function number 1

SELECT default_test(1,3); -- match the function number 2

SELECT default_test(1); -- ERROR
Regards

This would break at least 4 major applications which I personally have
worked on, and the benefit to users is unclear at best.

One of the main reasons to *have* default parameters is to allow adding
new parameters to existing functions without breaking old application
code.  So, -1 from me.



me too.

The OP's statement that there is a difference between default params and 
optional params doesn't compute. The only way params are optional is 
that they have a default. Treating these as somehow independent is a 
nonsense.


Furthermore, if we dropped function 2, then

select default_test(1,3)

would be a call to function 1. Then, if we followed this proposal, 
creating function 2 would magically steer that call to function 2 
instead. Talk about a footgun, and possibly a security risk too.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure
On Wed, Feb 26, 2014 at 12:05 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 09:57 AM, Merlin Moncure wrote:
 What is not going to be so clear for users (particularly without good
 supporting documentation) is how things break down in terms of usage
 between hstore and jsonb.

 Realistically?  Once we get done with mapping the indexes and operators,
 users who are used to Hstore1 use Hstore2, and everyone else uses jsonb.
  jsonb is nothing other than a standardized syntax interface to hstore2,
 and most users will choose the syntax similar to what they already know
 over learning new stuff.

The problem is that as of today, they are not done and AFAICT will not
be for 9.4.  Developers wanting to utilize the nosql pattern are going
to have to lean heavily on hstore API and that's a simple
fact...people reading about all the great new feature of postgres are
going to want to learn how to do things and it's reasonable to want to
anticipate the things they want to do and explain how to use them.  I
would like to extend that case coverage to include the json type as
well as its documentation is pretty lousy for that (I should know: I
wrote most of it).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function sugnature with default parameter

2014-02-26 Thread Pavel Stehule
2014-02-26 20:36 GMT+01:00 Andrew Dunstan and...@dunslane.net:


 On 02/26/2014 01:51 PM, Josh Berkus wrote:

 On 02/26/2014 10:15 AM, salah jubeh wrote:

 I think, there is a difference between optional parameters and default
 parameter values. So, my suggestion would be something like this.
 SELECT default_test(1,3, DEFAULT); -- match function number 1

 SELECT default_test(1,3); -- match the function number 2

 SELECT default_test(1); -- ERROR
 Regards

 This would break at least 4 major applications which I personally have
 worked on, and the benefit to users is unclear at best.

 One of the main reasons to *have* default parameters is to allow adding
 new parameters to existing functions without breaking old application
 code.  So, -1 from me.


 me too.

 The OP's statement that there is a difference between default params and
 optional params doesn't compute. The only way params are optional is that
 they have a default. Treating these as somehow independent is a nonsense.

 Furthermore, if we dropped function 2, then

 select default_test(1,3)

 would be a call to function 1. Then, if we followed this proposal,
 creating function 2 would magically steer that call to function 2 instead.
 Talk about a footgun, and possibly a security risk too.


more overloaded functions with similar type signature is just dangerous and
bad practice.

We would to disallow it.

Regards

Pavel



 cheers

 andrew



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Simplified VALUES parameters

2014-02-26 Thread David Johnston
Leon Smith wrote
 Hi,  I'm the maintainer and a primary author of a postgresql client
 library
 for Haskell,  called postgresql-simple,  and I recently investigated
 improving support for VALUES expressions in this library.  As a result,
 I'd
 like to suggest two changes to postgresql:
 
 1.   Allow type specifications inside AS clauses,  for example
 
 (VALUES (1,'hello'),(2,'world')) AS update(x int, y text)
 
 2.  Have an explicit syntax for representing VALUES expressions which
 contain no rows,  such as VALUES ().   (although the precise syntax isn't
 important to me.)
 
 My claim is that these changes would make it simpler for client libraries
 to properly support parameterized VALUES expressions.  If you care,  I've
 included a postscript including a brief background,  and a link to my
 analysis and motivations.

At a high-level I don't see how the nature of SQL would allow for either of
these things to work.  The only reason there even is (col type, col2 type)
syntax is because record-returning functions have to have their return type
defined during query construction.  The result of processing a VALUES clause
has to be a normal relation - the subsequent presence of AS simply provides
column name aliases because in the common form each column is assigned a
generic name during execution.

Defining a generic empty-values expression has the same problem in that you
have to define how many, with type and name, columns the VALUES expression
needs to generate.

From what I can see SQL is not going to readily allow for the construction
of virtual tables via parameters.  You need either make those tables
non-virtual (even if temporary) or consolidate them into an ARRAY.  In short
you - the client library - probably can solve the virtual table problem but
you will have to accommodate user-specified typing somehow in order to
supply valid SQL to the server.

The two common solutions for your specified use-case are either the user
creates the needed temporary table and writes the update query to join
against that OR they write the generic single-record update statement and
then loop over all desired input values - ideally all done within a
transaction.  In your situation you should automate that by taking your
desired syntax and construct a complete script that can then been sent to
PostgreSQL.

I don't imagine that the need for dynamically specified virtual tables is
going to be strong enough for people to dedicate the amount of resources it
would take to implement such a capability.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Simplified-VALUES-parameters-tp5793744p5793756.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-26 Thread Jeremy Harris

On 26/02/14 03:08, Robert Haas wrote:

On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris j...@wizmail.org wrote:

On 24/02/14 17:38, Robert Haas wrote:


On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote:


Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap.  The existing
code takes none at all.



Can you explain this further?  This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.



In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array.  A copy is taken of the SortTuple being
operated on for the inner loop to use.  This copy suffers cache misses.

(The inner loop operates on elements between the copy source and the
end of the array).


In the original, the outer loop runs the array in increasing index
order.  Again a copy is taken of the SortTuple for the inner loop
to use.  This copy does not appear to take significant cache misses.

(The inner loop operates on elements between the copy source and
the start of the array).


Do you have any theory as to why this happens in one case but not the other?


Nope.  Only really wild stuff requiring the cpu memory channel
recognising the forward scan (despite the inner loop) and not
the backward one - and ignoring prefetch instructions, which I
experimented with.
--
Cheers,
   Jeremy




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Stephen Frost
Dimitri,

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Peter Eisentraut pete...@gmx.net writes:
  The problem I see, however, is that most extensions, by recommendation,
  set module_pathname = '$libdir/pgfoo', and so relocating the control
  files will still end up pointing to a not relocated library file.
 
 It's kind of true. Is the phrasing “typically” followed by an example
 really a recommendation though? I though it was more a detailed
 explanation of the way it works.

It probably sufficient to say that nearly all the ones out there are
this way, so I don't know that it really matters if the userbase
consideres it a recommendation or just documentation- that's what
they're doing.

I admit, I've not gone and looked through PGXN explicitly to check this,
but I'm pretty sure I'm right because every one I've ever seen has done
it this way.

 Also, editing the AS '$libdir/foo' occurences from an SQL script is a
 quite very easy thing to do programmatically.

Sure, but I tend to agree with Peter about it not being a terribly great
solution to the problem.

 I mainly see that as a distribution/distributor problem tho.

This comment doesn't really make any sense to me.  My hope is to have
some kind of automated process to build .debs for projects on PGXN but
people are still going to be able to download them directly.  If what's
in PGXN matches what the debs have, then people downloading directly
will have to modify everything.  Perhaps PGXN could be modified to make
it something different for people downloading directly and then the debs
could switch it to $libdir- except, what would they set it to..?  And
we'd be asking every single author to make a change to their module,
which is a pretty terrible thing to do.

  In order to address this properly, we need a new directory structure
  that keeps library files and control files together, similar to how
  Python, Ruby, etc. install things, and then just one path for everything.
 
 It might be true, be it reads to me like you're omiting the directory
 parameter from the control file: the scripts and auxilliary control
 files might be found anywhere else on the file system already.

This is true and Debian puts the control/sql files into a different
directory than the .so files, of course.  Still, the issue here is how
we find the .so files- the user *has* to tell us where the control file
is, if it isn't in the default location, and the assumption (default?)
is then that the .sql files are co-located with them.  It's at that
point when we get to the point of trying to figure out what $libdir is
and it strikes me that when users use this approach the expectation will
be that it's in the same directory as the control and .sql files.
Making that 'just work' has some definite advantages, though I also
share the concern of others (who I don't think have chimed in on this
particular topic thus far, so I'm really just guessing and it might be
just me) that we probably don't want to just start having the ability to
load .so files from anywhere- except, well, the superuser can already do
that anyway using C functions, so perhaps that isn't really a concern?

I'm still worired about the *conflicts* issue, where you might have a
.so from a -contrib install in $libdir, with .sql/.control files from
downloading the extension itself, hence my suggestion to actually use a
different namespace for extensions which exist outside of the normal
directories.  I'm thinking that would also help people doing
dump/restore to realize when they forget to pull in the 'updated' module
from PGXN rather than using the shipped-with-distro version.  Of course,
you still run the risk that such a dump might not work upon restore
(same as we have today w/ C functions whose .so's have vanished- try to
call them and you get an error back), though perhaps in this case we'd
end up with a dump that won't restore, right..?  That's not ideal
either, of course, but going back to the C functions again, it's
possible you could have a functional index using a custom C functions
and it's the same issue all over again.

 Again, my view is that if you want to do things in a non-standard way
 then you need to tweak the control file and maybe the script files. It's
 a distribution problem, and I'm solving it in an extra software layer.

That's a pretty reasonable option for this specific issue, but it
doesn't address the duplication problem, which worries me.

 PostgreSQL is very flexible about where to organise extension files
 currently, *except* for the control file. This patch is providing the
 same level of flexibility to this part. Of course flexibility can be
 seen as creating a mess, but I don't think it's this patch nor
 PostgreSQL core to solve that mess.

Yeah, but we should be trying to avoid practices and configurations
which encourage mess creation. :)

  Also, the documentation states that this controls the location of the
  control file, but it of course controls the location of the 

Re: [HACKERS] Changeset Extraction v7.7

2014-02-26 Thread Alvaro Herrera
Andres Freund escribió:
 On 2014-02-26 15:30:55 -0300, Alvaro Herrera wrote:
  Andres Freund escribió:
  
   I am wondering about the related situation of GetOldestXmin()
   callers. There's a fair bit of duplicated logic in the callers, before
   but especially after this patchset. What about adding 'Relation rel'
   parameter instead of `allDbs' and `systable'? That keeps the logic
   centralized and there's been a fair amount of talk about vacuum
   optimizations that could also use it.
   It's a bit sad that that requires including rel.h from procarray.h...
  
  relcache.h, not rel.h.
 
 RelationData is declared in rel.h, not relcache.h, no?

Sure, but with your patch AFAICT procarray.h header only needs Relation,
which is declared in relcache.h.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Adding a copying type for array elements

2014-02-26 Thread AlexK
It would be nice to be able to declare something like this for a function
returning an unnested array:

RETURNS TABLE(some_value mytable.myarray_column%ELEMENT_TYPE, ...)

Does it make sense?




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Adding-a-copying-type-for-array-elements-tp5793762.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding a copying type for array elements

2014-02-26 Thread Pavel Stehule
2014-02-26 21:12 GMT+01:00 AlexK alk...@gmail.com:

 It would be nice to be able to declare something like this for a function
 returning an unnested array:

 RETURNS TABLE(some_value mytable.myarray_column%ELEMENT_TYPE, ...)


it has sense, but it is dangerous with current implementation. There are no
persistent relation between table and type used in signature.

So better use raw types or polymorphic types there only.

Pavel



 Does it make sense?




 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Adding-a-copying-type-for-array-elements-tp5793762.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Josh Berkus
On 02/26/2014 11:39 AM, Merlin Moncure wrote:
 On Wed, Feb 26, 2014 at 12:05 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 09:57 AM, Merlin Moncure wrote:
 What is not going to be so clear for users (particularly without good
 supporting documentation) is how things break down in terms of usage
 between hstore and jsonb.

 Realistically?  Once we get done with mapping the indexes and operators,
 users who are used to Hstore1 use Hstore2, and everyone else uses jsonb.
  jsonb is nothing other than a standardized syntax interface to hstore2,
 and most users will choose the syntax similar to what they already know
 over learning new stuff.
 
 The problem is that as of today, they are not done and AFAICT will not
 be for 9.4.  

Well, we plan to push to have the indexes and operators available as an
extension by the time that 9.4 comes out.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 This is true and Debian puts the control/sql files into a different
 directory than the .so files, of course.  Still, the issue here is how
 we find the .so files- the user *has* to tell us where the control file
 is, if it isn't in the default location, and the assumption (default?)
 is then that the .sql files are co-located with them.  It's at that
 point when we get to the point of trying to figure out what $libdir is

Ok you're migthy confused.

The rules that PostgreSQL follows to know where to load the library from
are not changed *at all* by this patch. In my book, it makes the whole
topic irrelevant to the review.

Futhermore, the rules in question come from two different elements:

  - the object file name in the AS clause, available *separately* for
each and every function definition, to be found in the script files:

src/backend/commands/functioncmds.c:744

* For a dynamically linked C language object, the form of the clause is
*
*  AS object file name [, link symbol name ]

  - the dynamic_library_path GUC that helps interpreting the object file
name when it's not absolute or when it contains $libdir as its first
characters.

If you want to change the rules and provide a way to resolve the object
file name to use on a per-extension level, fee free to propose a patch.

 Yeah, but it seems to be pretty rarely used and the expectation is that
 the .sql files resides in the same directory.  I think what we're
 looking for here, in some ways, is for that default for .so's to work
 out the same- except that right now, the users seem to all default to
 sticking in $libdir.

It used to be a script.sql.in containing AS 'MODULE_PATHNAME', which
would then be replaced with $libdir by pgxs.mk (the rule is still here
in the file). Nowadays we have the replacement facility in the backend,
driven by the module_pathname property in the extension's control file.

Contrib modules are still using the AS 'MODULE_PATHNAME' spelling with
the extension control file spelling module_pathname = '$libdir/xxx'.

Nothing changes with this patch other than where to find the extension
control file. How to resolve the object file name on the file system
is still the distribution and local admin problem.

That the controlling of where to find the dynamic libs is convoluted and
involves other people than just the PostgreSQL backend packager might be
seen as a problem or a great flexibility, in any case I don't see what
it has to do with reviewing the extension_control_path patch.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 The rules that PostgreSQL follows to know where to load the library from
 are not changed *at all* by this patch. In my book, it makes the whole
 topic irrelevant to the review.

I'm really quite tired of the constant dismissal of anything brought up
by anyone regarding any changes about anything.

I didn't suggest anywhere that the proposed patch changed the rules at
all- instead I was trying to point out that by adding this functionality
and *not* changing the way that lookup is done *is going to cause
confusion*.

[... quotes from the docs which aren't relevant ...]

 If you want to change the rules and provide a way to resolve the object
 file name to use on a per-extension level, fee free to propose a patch.

Or, I could simply voice my opinion that this patch *should not go in*
without such a change, or at *least* some thought and discussion about
what the right answer is here.  I'm evidently not alone with this
concern either as it's exactly (as I understand it at least; I don't
mean to put words into his mouth) what Peter *just* brought up too.

I'd really appreciate it if you would stop trying to seperate every
other possible thing to do with anything from this patch except the one
little thing you want.  This patch touches code related to extensions
and it's necessary for us to consider it in that broader light.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] GIN improvements part2: fast scan

2014-02-26 Thread Alexander Korotkov
On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

 I've rebased catalog changes with last master. Patch is attached. I've
 rerun my test suite with both last master ('committed') and attached
 patch ('ternary-consistent').


 Thanks!


method |   sum
 +--
   committed  | 143491.71501
   fast-scan-11   | 126916.11199
   fast-scan-light|   137321.211
   fast-scan-light-heikki | 138168.02801
   master |   446976.288
   ternary-consistent |   125923.514

 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
 to 4.


 Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
 sure we get similar behavior in Tomas' tests that used 6 search terms. But
 I always felt that it was too large for real queries, once we have the
 catalog changes, that's why I lowered to 4 when committing. If an opclass
 benefits greatly from fast scan, it should provide the ternary consistent
 function, and not rely on the shim implementation.


  I'm not sure about decision to reserve separate procedure number for
 ternary consistent. Probably, it would be better to add ginConfig method.
 It would be useful for post 9.4 improvements.


 Hmm, it might be useful for an opclass to provide both, a boolean and
 ternary consistent function, if the boolean version is significantly more
 efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
 quick check through the array to see if there are any MAYBE arguments,
 within the consistent function. But I'm inclined to keep the possibility to
 provide both versions. As long as we support the boolean version at all,
 there's not much difference in terms of the amount of code to support
 having them both for the same opclass.

 A ginConfig could be useful for many other things, but I don't think it's
 worth adding it now.


 What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? We
 discussed that earlier, but didn't reach any conclusion. That needs to be
 clarified in the docs. One possibility is to document that they're
 equivalent. Another is to forbid one of them. Yet another is to assign a
 different meaning to each.

 I've been thinking that it might be useful to define them so that a MAYBE
 result from the tri-consistent function means that it cannot decide if you
 have a match or not, because some of the inputs were MAYBE. And
 TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
 FALSE, the result would be the same, TRUE+recheck. The practical difference
 would be that if the tri-consistent function returns TRUE+recheck, ginget.c
 wouldn't need to bother fetching the other entries, it could just return
 the entry with recheck=true immediately. While with MAYBE result, it would
 fetch the other entries and call tri-consistent again. ginget.c doesn't
 currently use the tri-consistent function that way - it always fetches all
 the entries for a potential match before calling tri-consistent, but it
 could. I had it do that in some of the patch versions, but Tomas' testing
 showed that it was a big loss on some queries, because the consistent
 function was called much more often. Still, something like that might be
 sensible in the future, so it might be good to distinguish those cases in
 the API now. Note that ginarrayproc is already using the return values like
 that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
 the inputs, but in other cases it uses GIN_MAYBE.


Next revision of patch is attached.

In this version opclass should provide at least one consistent function:
binary or ternary. It's expected to achieve best performance when opclass
provide both of them. However, tests shows opposite :( I've to recheck it
carefully.

I've removed recheck flag from ternary consistent function. It have to
return GIN_MAYBE instead.

--
With best regards,
Alexander Korotkov.


gin-ternary-consistent-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 I didn't suggest anywhere that the proposed patch changed the rules at
 all- instead I was trying to point out that by adding this functionality
 and *not* changing the way that lookup is done *is going to cause
 confusion*.

I don't see any confusion about dynamic library name resolving added
from the extension_control_path, I'm sorry. Simply because I don't
expect people to use the facility without a third party software
designed to fill-in the gap.

You're saying that the backend should fill the gap, I'm saying that it
should not. Or maybe within another patch entirely.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 I don't see any confusion about dynamic library name resolving added
 from the extension_control_path, I'm sorry. Simply because I don't
 expect people to use the facility without a third party software
 designed to fill-in the gap.
 
 You're saying that the backend should fill the gap, I'm saying that it
 should not. Or maybe within another patch entirely.

I find this role reversal to be quite bizarre.

Is this third-party software going to be modifying postgresql.conf too?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Alvaro Herrera
Andres Freund wrote:

 static void
 heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
 {
 ...
 HeapTupleHeaderClearHotUpdated(htup);
 HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
 HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 /* Make sure there is no forward chain link in t_ctid */
 htup-t_ctid = xlrec-target.tid;
 ...
 }

I think the fix is to reset HOT_UPDATED and t_ctid only if the infomask
says the tuple is LOCKED_ONLY, per the attached patch.  This matches
what heap_lock_tuple is doing originally:

if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
HeapTupleHeaderClearHotUpdated(tuple-t_data);
HeapTupleHeaderSetXmax(tuple-t_data, xid);

/*
 * Make sure there is no forward chain link in t_ctid.  Note that in the
 * cases where the tuple has been updated, we must not overwrite t_ctid,
 * because it was set by the updater.  Moreover, if the tuple has been
 * updated, we need to follow the update chain to lock the new versions 
of
 * the tuple as well.
 */
if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
tuple-t_data-t_ctid = *tid;

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a771ccb..6d1f9e0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8016,11 +8016,14 @@ heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
 
 	fix_infomask_from_infobits(xlrec-infobits_set, htup-t_infomask,
 			   htup-t_infomask2);
-	HeapTupleHeaderClearHotUpdated(htup);
+	if (HEAP_XMAX_IS_LOCKED_ONLY(htup-t_infomask))
+	{
+		HeapTupleHeaderClearHotUpdated(htup);
+		/* Make sure there is no forward chain link in t_ctid */
+		htup-t_ctid = xlrec-target.tid;
+	}
 	HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
 	HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
-	/* Make sure there is no forward chain link in t_ctid */
-	htup-t_ctid = xlrec-target.tid;
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buffer);
 	UnlockReleaseBuffer(buffer);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-02-26 Thread Alexander Korotkov
On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

 I've rebased catalog changes with last master. Patch is attached. I've
 rerun my test suite with both last master ('committed') and attached
 patch ('ternary-consistent').


 Thanks!


method |   sum
 +--
   committed  | 143491.71501
   fast-scan-11   | 126916.11199
   fast-scan-light|   137321.211
   fast-scan-light-heikki | 138168.02801
   master |   446976.288
   ternary-consistent |   125923.514

 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
 to 4.


 Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
 sure we get similar behavior in Tomas' tests that used 6 search terms. But
 I always felt that it was too large for real queries, once we have the
 catalog changes, that's why I lowered to 4 when committing. If an opclass
 benefits greatly from fast scan, it should provide the ternary consistent
 function, and not rely on the shim implementation.


  I'm not sure about decision to reserve separate procedure number for
 ternary consistent. Probably, it would be better to add ginConfig method.
 It would be useful for post 9.4 improvements.


 Hmm, it might be useful for an opclass to provide both, a boolean and
 ternary consistent function, if the boolean version is significantly more
 efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
 quick check through the array to see if there are any MAYBE arguments,
 within the consistent function. But I'm inclined to keep the possibility to
 provide both versions. As long as we support the boolean version at all,
 there's not much difference in terms of the amount of code to support
 having them both for the same opclass.

 A ginConfig could be useful for many other things, but I don't think it's
 worth adding it now.


 What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck?
 We discussed that earlier, but didn't reach any conclusion. That needs to
 be clarified in the docs. One possibility is to document that they're
 equivalent. Another is to forbid one of them. Yet another is to assign a
 different meaning to each.

 I've been thinking that it might be useful to define them so that a MAYBE
 result from the tri-consistent function means that it cannot decide if you
 have a match or not, because some of the inputs were MAYBE. And
 TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
 FALSE, the result would be the same, TRUE+recheck. The practical difference
 would be that if the tri-consistent function returns TRUE+recheck, ginget.c
 wouldn't need to bother fetching the other entries, it could just return
 the entry with recheck=true immediately. While with MAYBE result, it would
 fetch the other entries and call tri-consistent again. ginget.c doesn't
 currently use the tri-consistent function that way - it always fetches all
 the entries for a potential match before calling tri-consistent, but it
 could. I had it do that in some of the patch versions, but Tomas' testing
 showed that it was a big loss on some queries, because the consistent
 function was called much more often. Still, something like that might be
 sensible in the future, so it might be good to distinguish those cases in
 the API now. Note that ginarrayproc is already using the return values like
 that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
 the inputs, but in other cases it uses GIN_MAYBE.


 Next revision of patch is attached.

 In this version opclass should provide at least one consistent function:
 binary or ternary. It's expected to achieve best performance when opclass
 provide both of them. However, tests shows opposite :( I've to recheck it
 carefully.


However, it's not!
This revision of patch completes my test case in 123330 ms. This is
slightly faster than previous revision.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] extension_control_path

2014-02-26 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 I find this role reversal to be quite bizarre.

Who do you think should have a say about where to load the dynamic
librairies from?  hackers, packagers, system admins, dbas or users?

Who do you think is currently editing the setup that decides where to
load the dynamic librairies from, which is spread into SQL scripts,
extension control file, postgresql.conf and pg_config --pkglibdir?

What exactly are you calling bizarre in the idea that the PostgreSQL
source code is maybe not the best way where to solve that problem from?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Who do you think should have a say about where to load the dynamic
 librairies from?  hackers, packagers, system admins, dbas or users?

My gut feeling on this is packages and sysadmins.  Do you see it
differently?  I generally *don't* think DBAs-who-aren't-sysadmins should
be doing it as they may not have any access to the underlying OS (or
perhaps have access only to things like /home), which goes back to my
concerns over ALTER SYSTEM, but that ship has obviously sailed.

 Who do you think is currently editing the setup that decides where to
 load the dynamic librairies from, which is spread into SQL scripts,
 extension control file, postgresql.conf and pg_config --pkglibdir?

I agree that packagers and sysadmins will be setting this up initially,
but it strikes me as a bit silly to ask the sysadmins to go modify the
control file path and then also have to modify the dynamic library load
path when they're setting them to the same thing.

Related to this, as I've asked repeatedly on this thread- what is the
plan for dealing with namespace overlaps?  As in, the admin happily goes
in and sets dynamic_library_path to '$libdir:/path/to/new/hstore' and
then tries to CREATE EXTENSION hstore; with the -contrib packages
installed?  They're going to get cryptic and bizarre error messages, at
best during the CREATE EXTENSION and at worst when they actually try to
run that one function whose interface changed, perhaps months after the
initial install.

Part of the reason that I'm pushing for a change here is to try and
address that problem.  I'd appreciate some feedback on it.

 What exactly are you calling bizarre in the idea that the PostgreSQL
 source code is maybe not the best way where to solve that problem from?

I was referring to the apparent role reversal between us, with me trying
to get PG to do more and you pushing to have more in an external tool.
It wasn't that long ago that our positions were swapped.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extension_control_path

2014-02-26 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Who do you think should have a say about where to load the dynamic
 librairies from?  hackers, packagers, system admins, dbas or users?

 My gut feeling on this is packages and sysadmins.  Do you see it

+1

 Who do you think is currently editing the setup that decides where to
 load the dynamic librairies from, which is spread into SQL scripts,
 extension control file, postgresql.conf and pg_config --pkglibdir?

 I agree that packagers and sysadmins will be setting this up initially,

Not quite, because of the ability to ship absolute object file names in
the SQL script and the extension control files, edited by hackers.

The third party tool I'm talking about will have to edit those files at
packaging time in order to get the control back to where you want it.

 but it strikes me as a bit silly to ask the sysadmins to go modify the
 control file path and then also have to modify the dynamic library load
 path when they're setting them to the same thing.

Well, the point is that if you edit the control file, then you don't
have to care about the dynamic_library_path at all, because you're going
to setup absolute object file names (or location).

 Related to this, as I've asked repeatedly on this thread- what is the
 plan for dealing with namespace overlaps?  As in, the admin happily goes
 in and sets dynamic_library_path to '$libdir:/path/to/new/hstore' and
 then tries to CREATE EXTENSION hstore; with the -contrib packages
 installed?

My proposal is to edit the control file module_pathname property to the
right absolute location within the new hstore binary packaging. That
responsibility is then given to the new third party tool, aimed at both
packagers and system admins.

 Part of the reason that I'm pushing for a change here is to try and
 address that problem.  I'd appreciate some feedback on it.

Within the way I see things, this problem just doesn't exist, by design.

 I was referring to the apparent role reversal between us, with me trying
 to get PG to do more and you pushing to have more in an external tool.
 It wasn't that long ago that our positions were swapped.

Well you know, I actually read my emails and learn from them.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Peter Geoghegan
On Wed, Feb 26, 2014 at 1:23 PM, Andrew Dunstan and...@dunslane.net wrote:
 +   if (va-string.len == vb-string.len)
 +   {
 +   res = memcmp(va-string.val, vb-string.val,
 va-string.len);
 +   if (res == 0  arg)
 +   *(bool *) arg = true;

 Should be NULL, not 0.


 No, the compiler doesn't like that for int values.

I'm confused. I just pulled from feodor/jsonb_and_hstore, and I do see
a compiler warning (because the code reads res == NULL, unlike
above). It appears to have been that way in Git since last year. So,
maybe Andres meant that it *should* look like this?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Alvaro Herrera
I forgot to mention that the bug can be reproduced in a hot-standby
setup with the attached isolation spec.  Note that full_page_writes must
be turned off (otherwise, the updates use full-page writes and then the
bogus code is not run).  Once the spec is executed, in the replica run

SET enable_seqscan TO off;
SELECT * FROM f WHERE a=1;

and you get no rows back.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

setup
{
DROP TABLE IF EXISTS f;
CREATE TABLE f (a int primary key, b int);
INSERT INTO f values (1, 42);
}

session s1
step begin{ BEGIN; }
step update   { UPDATE f SET b = b + 1; }
step commit   { COMMIT; }

session s2
step lock { SELECT * FROM f FOR KEY SHARE; }

permutation begin update update lock commit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-26 Thread Stephen Frost
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
  Who do you think should have a say about where to load the dynamic
  librairies from?  hackers, packagers, system admins, dbas or users?
 
  My gut feeling on this is packages and sysadmins.  Do you see it
 
 +1
 
  Who do you think is currently editing the setup that decides where to
  load the dynamic librairies from, which is spread into SQL scripts,
  extension control file, postgresql.conf and pg_config --pkglibdir?
 
  I agree that packagers and sysadmins will be setting this up initially,
 
 Not quite, because of the ability to ship absolute object file names in
 the SQL script and the extension control files, edited by hackers.
 
 The third party tool I'm talking about will have to edit those files at
 packaging time in order to get the control back to where you want it.

I'm a bit confused here- above you '+1'd packagers/sysadmins, but then
here you are saying that hackers will be setting it?  Also, it strikes
me as a terrible idea to ship absolute object file names (which I assume
you mean to include path, given you say 'absolute') unless you're an
actual OS packaging system.  I know OSX has some packaging system which
lets you install things that aren't-quite-the-OS, can a package
completely depend on the install path that the hacker decided upon for
the package initially?  It's not relocatable at all on a given system?
Is there a 'shared' directory for all those packages into which each
package can drop files?  How is the naming for those files handled?  Is
there more than one such directory?

Presumably, that's what you'd want to set both the control path and the
dynamic extension path to- a directory of control files and a directory
of .so's, or perhaps one combined directory of both, for the simplest
setup.  If you're working with a directory-per-package, then wouldn't
you want to have everything for that package in that package's directory
and then only have to add all those directories to one place in
postgresql.conf?

I've been trying to follow this thread pretty closely but perhaps I've
missed it (or forgotten it) and, if so, my apologies, but how exactly
are you envisioning PG, these GUCs, whichever OSX packaging system, and
your external tool working together?

 Well, the point is that if you edit the control file, then you don't
 have to care about the dynamic_library_path at all, because you're going
 to setup absolute object file names (or location).

My questions about this are mostly covered above, but I did want to get
clarification- is this going to be on a per-system basis, as in, when
the package is installed through your tool, it's going to go figure out
where the package got installed to and rewrite the control file?  Seems
like a lot of work if you're going to have to add that directory to the
postgresql.conf path for the control file anyway to then *also* have to
hack up the control file itself.

  Related to this, as I've asked repeatedly on this thread- what is the
  plan for dealing with namespace overlaps?  As in, the admin happily goes
  in and sets dynamic_library_path to '$libdir:/path/to/new/hstore' and
  then tries to CREATE EXTENSION hstore; with the -contrib packages
  installed?
 
 My proposal is to edit the control file module_pathname property to the
 right absolute location within the new hstore binary packaging. That
 responsibility is then given to the new third party tool, aimed at both
 packagers and system admins.

I can see some of the simplicity in that, though it strikes me as being
more-or-less equivilant to just searching the directory where the
control file exists first, if it isn't in the PG default, with the added
benefit that moving the base-install location for the modules would only
require updating the postgresql.conf rather than having to update it and
then also go modify every control file.

  Part of the reason that I'm pushing for a change here is to try and
  address that problem.  I'd appreciate some feedback on it.
 
 Within the way I see things, this problem just doesn't exist, by design.

I understand your proposal better now that I understand how you're
planning to use it, but I'm still of the opinion that we might be able
to do better by our users by not hard-coding paths into every control
file.

 Well you know, I actually read my emails and learn from them.

I think we all aspire to do that. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 04:59 PM, Peter Geoghegan wrote:

On Wed, Feb 26, 2014 at 1:23 PM, Andrew Dunstan and...@dunslane.net wrote:

+   if (va-string.len == vb-string.len)
+   {
+   res = memcmp(va-string.val, vb-string.val,
va-string.len);
+   if (res == 0  arg)
+   *(bool *) arg = true;

Should be NULL, not 0.


No, the compiler doesn't like that for int values.

I'm confused. I just pulled from feodor/jsonb_and_hstore, and I do see
a compiler warning (because the code reads res == NULL, unlike
above). It appears to have been that way in Git since last year. So,
maybe Andres meant that it *should* look like this?





argh!

I forgot to save a file.

Here's what I get if it's NULL:

   gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -I../../../../src/include
   -D_GNU_SOURCE -I/usr/include/libxml2   -c -o jsonb_support.o
   jsonb_support.c -MMD -MP -MF .deps/jsonb_support.Po
   jsonb_support.c: In function ‘compareJsonbStringValue’:
   jsonb_support.c:137:11: warning: comparison between pointer and
   integer [enabled by default]

With 0 there is no complaint.

new patch attached, change pushed to github.

cheers

andrew




jsonb-13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Andres Freund
On 2014-02-26 18:18:05 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
 
  static void
  heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
  {
  ...
  HeapTupleHeaderClearHotUpdated(htup);
  HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
  HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
  /* Make sure there is no forward chain link in t_ctid */
  htup-t_ctid = xlrec-target.tid;
  ...
  }
 
 I think the fix is to reset HOT_UPDATED and t_ctid only if the infomask
 says the tuple is LOCKED_ONLY, per the attached patch.

Looks good to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andres Freund
On 2014-02-26 16:23:12 -0500, Andrew Dunstan wrote:
 On 02/10/2014 09:11 PM, Andres Freund wrote:

 Is it just me or is jsonapi.h not very well documented?
 
 
 What about it do you think is missing? In any case, it's hardly relevant to
 this patch, so I'll take that as obiter dicta.

It's relevant insofer because I tried to understand it, to understand
whether this patch's usage is sensible.

O n a quick reread of the header, what I am missing is:
* what's semstate in JsonSemAction? Private data?
* what's object_start and object_field_start? Presumably object vs
  keypair? Why not use element as ifor the array?
* scalar_action is called for which types of tokens?
* what's exactly the meaning of the isnull parameter for ofield_action
  and aelem_action?
* How is one supposed to actually access data in the callbacks, not
  obvious for all the callbacks.
* are scalar callbacks triggered for object keys, object/array values?
...

 +static void
 +putEscapedValue(StringInfo out, JsonbValue *v)
 +{
 +   switch (v-type)
 +   {
 +   case jbvNull:
 +   appendBinaryStringInfo(out, null, 4);
 +   break;
 +   case jbvString:
 +   escape_json(out, pnstrdup(v-string.val, 
 v-string.len));
 +   break;
 +   case jbvBool:
 +   if (v-boolean)
 +   appendBinaryStringInfo(out, true, 4);
 +   else
 +   appendBinaryStringInfo(out, false, 5);
 +   break;
 +   case jbvNumeric:
 +   appendStringInfoString(out, 
 DatumGetCString(DirectFunctionCall1(numeric_out, 
 PointerGetDatum(v-numeric;
 +   break;
 +   default:
 +   elog(ERROR, unknown jsonb scalar type);
 +   }
 +}
 Hm, will the jbvNumeric always result in correct correct quoting?
 datum_to_json() does extra hangups for that case, any reason we don't
 need that here?

 Yes, there is a reason we don't need it here. datum_to_json is converting
 SQL numerics to json, and these might be strings such as 'Nan'. But we never
 store something in a jsonb numeric field unless it came in as a json numeric
 format, which never needs quoting. The json parser will never parse 'NaN' as
 a numeric value.

Ah, yuck. Makes sense. Not your fault at all, but I do dislike json's
definition of numeric values.

 +char *
 +JsonbToCString(StringInfo out, char *in, int estimated_len)
 +{
 ...
 +   while (redo_switch || ((type = JsonbIteratorGet(it, v, false)) != 0))
 +   {
 +   redo_switch = false;
 Not sure if I see the advantage over the goto here. A comment explaining
 what the reason for the goto is wouldhave sufficed.
 
 I think you're being pretty damn picky here. You whined about the goto, I
 removed it, now you don't like that either. Personally I think this is
 cleaner.

Sorry, should perhaps have been a bit more precise in my disagreement
abou the goto version. I didn't dislike the goto itself, but that it was
a undocumented and unobvious change in control flow.

It's the reviewers job to be picky, I pretty damn sure don't expect you
to agree with all my points and I am perfectly fine if you disregard
several of them. I've just read through the patch quickly, so it's not
surprising if I misidentify some.


 
 +   case WJB_KEY:
 +   if (first == false)
 +   appendBinaryStringInfo(out, , , 2);
 +   first = true;
 +
 +   putEscapedValue(out, v);
 +   appendBinaryStringInfo(out, : , 2);
 putEscapedValue doesn't gurantee only strings are output, but
 datum_to_json does extra hangups for that case.
 
 But the key here will always be a string. It's enforced by the JSON rules. I
 suppose we could call escape_json directly here and save a function call,
 but I don't agree that there is any problem here.

Ah, yes, it will already have been converted to a string during the
initial conversion, right. /* json rules guarantee this is a string */
or something?

 
 +   type = JsonbIteratorGet(it, v, false);
 +   if (type == WJB_VALUE)
 +   {
 +   first = false;
 +   putEscapedValue(out, v);
 +   }
 +   else
 +   {
 +   Assert(type == WJB_BEGIN_OBJECT || type 
 == WJB_BEGIN_ARRAY);
 +   /*
 +* We need to rerun current switch() 
 due to put
 +* in current place object which we 
 just got
 +* from iterator.
 +*/
 due to put?
 
 
 I think that's due to the author not being a native English speaker. I've
 

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Erik Rijkers
On Wed, February 26, 2014 23:10, Andrew Dunstan wrote:

 new patch attached, change pushed to github.

 [jsonb-13.patch.gz]


This does not apply, see attached: src/backend/utils/adt/jsonfuncs.c.rej

Please ignore if this was not supposed to work together with the earlier 
nested-hstore-11.patch

github branch jsonb_and_hstore built fine; I'll use that instead.
  ( https://github.com/feodor/postgres.git )


thanks,

Erik Rijkers



patch -b -l -F 25 -p 1  
/home/aardvark/download/pgpatches/0094/nested_hstore/20140225/jsonb-13.patch
patching file doc/src/sgml/datatype.sgml
patching file doc/src/sgml/func.sgml
patching file src/backend/catalog/system_views.sql
Hunk #1 succeeded at 822 (offset 3 lines).
patching file src/backend/utils/adt/Makefile
patching file src/backend/utils/adt/json.c
patching file src/backend/utils/adt/jsonb.c
patching file src/backend/utils/adt/jsonb_support.c
patching file src/backend/utils/adt/jsonfuncs.c
Hunk #20 succeeded at 1419 with fuzz 2 (offset 4 lines).
Hunk #21 succeeded at 1691 (offset 4 lines).
Hunk #22 succeeded at 1843 (offset 4 lines).
Hunk #23 succeeded at 1940 (offset 4 lines).
Hunk #24 succeeded at 2013 (offset 4 lines).
Hunk #25 succeeded at 2035 (offset 4 lines).
Hunk #26 succeeded at 2054 (offset 4 lines).
Hunk #27 FAILED at 2090.
Hunk #28 succeeded at 2129 (offset 10 lines).
Hunk #29 succeeded at 2200 (offset 10 lines).
Hunk #30 succeeded at 2211 (offset 10 lines).
Hunk #31 succeeded at 2236 (offset 10 lines).
Hunk #32 succeeded at 2252 (offset 10 lines).
Hunk #33 succeeded at 2263 (offset 10 lines).
Hunk #34 succeeded at 2461 (offset 10 lines).
Hunk #35 succeeded at 2606 (offset 10 lines).
Hunk #36 succeeded at 2619 (offset 10 lines).
Hunk #37 FAILED at 2644.
Hunk #38 succeeded at 2692 (offset 14 lines).
Hunk #39 succeeded at 2702 (offset 14 lines).
Hunk #40 succeeded at 2730 (offset 14 lines).
2 out of 40 hunks FAILED -- saving rejects to file 
src/backend/utils/adt/jsonfuncs.c.rej
patching file src/include/catalog/pg_cast.h
patching file src/include/catalog/pg_operator.h
patching file src/include/catalog/pg_proc.h
patching file src/include/catalog/pg_type.h
patching file src/include/utils/json.h
patching file src/include/utils/jsonapi.h
patching file src/include/utils/jsonb.h
patching file src/test/regress/expected/jsonb.out
patching file src/test/regress/expected/jsonb_1.out
patching file src/test/regress/parallel_schedule
patching file src/test/regress/serial_schedule
patching file src/test/regress/sql/jsonb.sql






jsonfuncs.c.rej
Description: application/reject

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function sugnature with default parameter

2014-02-26 Thread salah jubeh
Hello Josh,

This would break at least 4 major applications which I personally have

I clearly see this issue, and  in my opinion , this is the greatest challenge 
for this proposal. So, I will say -1 if I consider the re-factoring need to be 
done in old code. My main concern of this post is not to push this proposal, 


worked on, and the benefit to users is unclear at best

Simply, overloading functions with the same parameter types but different 
numbers might be tricky. In the example above it is obvious that this will 
generate not only garbage -which is number 2 function- , but also number 1 
function is restricted in use.


One of the main reasons to *have* default parameters is to allow adding
new parameters to existing functions without breaking old application
code.  So, -1 from me.
I think, this could be handled by using overloading. 


I see the benefits of the way the default parameters are defined; but also, 
they are affecting overloading. May be, a warning message when creating a 
function like above would be nice. 


Regards




On Wednesday, February 26, 2014 7:51 PM, Josh Berkus j...@agliodbs.com wrote:
 
On 02/26/2014 10:15 AM, salah jubeh wrote:
 I think, there is a difference between optional parameters and default 
 parameter values. So, my suggestion would be something like this.

 SELECT default_test(1,3, DEFAULT); -- match function number 1
 
 SELECT default_test(1,3); -- match the function number 2 
 
 SELECT default_test(1); -- ERROR
 Regards

This would break at least 4 major applications which I personally have
worked on, and the benefit to users is unclear at best.

One of the main reasons to *have* default parameters is to allow adding
new parameters to existing functions without breaking old application
code.  So, -1 from me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: [HACKERS] Avoiding deeply nested AND/OR trees in the parser

2014-02-26 Thread Noah Misch
On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
 Over in
 http://www.postgresql.org/message-id/bay176-w382a9de827ebc8e602b7bbc5...@phx.gbl
 there's a complaint about getting stack depth limit exceeded from a
 query of the form
 
 select count(*) from gui_die_summary where (x_coord, y_coord) in
 ((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...);
 
 when there's a few thousand pairs in the IN list.  The reason for this
 is that transformAExprIn() generates a tree of nested OR expressions,
 and then recursion in assign_collations_walker() runs out of stack space.

 Really if we wanted to fix
 this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
 they recognized nested ANDs/ORs and flattened them on the spot.  This
 might not be a bad idea, but it's starting to look like more than a quick
 hack patch.

Reminds me of this work:
http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_...@mail.gmail.com
http://www.postgresql.org/message-id/flat/cafj8prdd9qtyoy0cbpoodr-hfj1xambuxwoxazg0kyvtvau...@mail.gmail.com

 Does this seem worth pursuing, or shall we leave it alone?

+1 for fixing.  Extrapolating from your figure of 20s and 20 GiB for a million
coordinate pairs, we'd have tolerable performance at 2 pairs instead of
just failing as we do today.  That's a nice win all by itself.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 05:48 PM, Erik Rijkers wrote:

On Wed, February 26, 2014 23:10, Andrew Dunstan wrote:

new patch attached, change pushed to github.

[jsonb-13.patch.gz]


This does not apply, see attached: src/backend/utils/adt/jsonfuncs.c.rej

Please ignore if this was not supposed to work together with the earlier 
nested-hstore-11.patch

github branch jsonb_and_hstore built fine; I'll use that instead.
   ( https://github.com/feodor/postgres.git )






Ugh, my master repo was 24 hours behind.

New patch attached..




jsonb-14.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Peter Geoghegan
On Wed, Feb 26, 2014 at 2:10 PM, Andrew Dunstan and...@dunslane.net wrote:
 new patch attached, change pushed to github.

 + /* GUC variables */
 + static bool pretty_print_var = false;
 + #define SET_PRETTY_PRINT_VAR(x) ((pretty_print_var) ? \
 +  ((x) | 
 PrettyPrint) : (x))

I think that this is not a great idea. I think that we should do away
with the GUC, but keep the function hstore_print() so we can pretty
print that way. I don't believe that this falls afoul of the usual
obvious reasons for not varying the behavior of IO routines with a
GUC, since it only varies whitespace, but it is surely pretty
questionable to have this GUC's setting vary the output of hstore_out,
an IMMUTABLE function.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Kouhei Kaigai
 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  IIUC, his approach was integration of join-pushdown within FDW APIs,
  however, it does not mean the idea of remote-join is rejected.
 
 For my part, trying to consider doing remote joins *without* going through
 FDWs is just nonsensical.  What are you joining remotely if not two foreign
 tables?

It is a case to be joined locally. If query has two foreign tables managed
by same server, this couple shall be found during the optimizer tries
various possible combinations.

 With regard to the GPU approach, if that model works whereby the
 normal PG tuples are read off disk, fed over to the GPU, processed, then
 returned back to the user through PG, then I wouldn't consider it really
 a 'remote' join but rather simply a new execution node inside of PG which
 is planned and costed just like the others.  We've been over the discussion
 already about trying to make that a pluggable system but the, very reasonable,
 push-back on that has been if it's really possible and really makes sense
 to be pluggable.  It certainly doesn't *have* to be- PostgreSQL is written
 in C, as we all know, and plenty of C code talks to GPUs and shuffles memory
 around- and that's almost exactly what Robert is working on supporting with
 regular CPUs and PG backends already.
 
 In many ways, trying to conflate this idea of using-GPUs-to-do-work with
 the idea of remote-FDW-joins has really disillusioned me with regard to
 the CustomScan approach.
 
Are you suggesting me to focus on the GPU stuff, rather than killing two birds
with a stone? It may be an approach, however, these have common part because
the plan-node for remote-join will pops tuples towards its upper node.
From viewpoint of the upper node, it looks like a black box that returns tuples
that joined two underlying relations. On the other hands, here is another black
box that returns tuples that scans or joins underlying relations with GPU 
assist.
Both of implementation detail is not visible for the upper node, but its 
external
interface is common. The custom-scan node can provide a pluggable way for both
of use-case.
Anyway, I'm not motivated to remote-join feature more than GPU-acceleration
stuff. If it is better to drop FDW's remote-join stuff from the custom-scan
scope, I don't claim it.

   Then perhaps they should be exposed more directly?  I can understand
   generally useful functionality being exposed in a way that anyone
   can use it, but we need to avoid interfaces which can't be stable
   due to normal / ongoing changes to the backend code.
  
  The functions my patches want to expose are:
   - get_restriction_qual_cost()
   - fix_expr_common()
 
 I'll try and find time to go look at these in more detail later this week.
 I have reservations about exposing the current estimates on costs as we
 may want to adjust them in the future- but such adjustments may need to
 be made in balance with other changes throughout the system and an external
 module which depends on one result from the qual costing might end up having
 problems with the costing changes because the extension author wasn't aware
 of the other changes happening in other areas of the costing.
 
It is also the point of mine. If cost estimation logic is revised in
the future, it makes a problem if extension cuts and copies the code.

 I'm talking about this from a beyond-just-the-GUCs point of view, I
 realize that the extension author could go look at the GUC settings, but
 it's entirely reasonable to believe we'll make changes to the default GUC
 settings along with how they're used in the future.
 
Is the GUC something like Boolean that shows whether the new costing model
is applied or not? If so, extension needs to keep two cost estimation logics
within its code, isn't it?
If the GUC shows something like a weight, I also think it makes sense.

  And, the functions my patches newly want are:
   - bms_to_string()
   - bms_from_string()
 
 Offhand, these look fine, if there's really an external use for them.
 Will try to look at them in more detail later.
 
At least, it makes sense to carry bitmap data structure on the private
field of custom-scan, because all the plan node has to be safe for
copyObject() manner.

   That's fine, if we can get data to and from those co-processors
   efficiently enough that it's worth doing so.  If moving the data to
   the GPU's memory will take longer than running the actual
   aggregation, then it doesn't make any sense for regular tables
   because then we'd have to cache the data in the GPU's memory in some
   way across multiple queries, which isn't something we're set up to do.
  
  When I made a prototype implementation on top of FDW, using CUDA, it
  enabled to run sequential scan 10 times faster than SeqScan on regular
  tables, if qualifiers are enough complex.
  Library to communicate GPU (OpenCL/CUDA) has asynchronous data
  transfer mode using hardware DMA. It allows to hide the cost of data
  transfer 

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-02-26 Thread Kouhei Kaigai
 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  I (plan to) use custom-scan of course. Once a relation is referenced
  and optimizer decided GPU acceleration is cheaper, associated custom-
  scan node read the data from underlying relation (or in-memory cache
  if exists) then move to the shared memory buffer to deliver GPU
  management background worker that launches asynchronous DMA one by one.
  After that, custom-scan node receives filtered records via shared-
  memory buffer, so it can construct tuples to be returned to the upper
  node.
 
 Alright- but have you discussed this with Robert?  We're going to be
 whacking things around for parallel support with new nodes and more built-in
 helper functionality for doing this work and I'm not anxious to have
 CustomScan end up being a legacy interface that we're required to pull
 forward because we accepted it before things had settled.
 
I had briefly introduced him my idea using GPU at Ottawa last year,
even though I'm not certain he remembered it.
At least, idea of custom-scan node came from the discussion at that
time.

  The custom-scan API is thin abstraction towards the plan node
  interface, not tightly convinced with a particular use case, like GPU,
  remote-join and so on. So, I'm quite optimistic for the future
 maintainability.
 
 I don't see how you can be when there hasn't been any discussion that I've
 seen about how parallel query execution is going to change things for us.
 
If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

  Also, please remind the discussion at the last developer meeting.
  The purpose of custom-scan (we didn't name it at that time) is to
  avoid unnecessary project branch for people who want to implement
  their own special feature but no facilities to enhance
  optimizer/executor are supported.
  Even though we have in-core parallel execution feature by CPU, it also
  makes sense to provide some unique implementation that may be suitable
  for a specific region.
 
 The issue here is that we're going to be expected to maintain an interface
 once we provide it and so that isn't something we should be doing lightly.
 Particularly when it's as involved as this kind of change is with what's
 going on in the backend where we are nearly 100% sure to be changing things
 in the next release or two.
 
FDW APIs are also revised several times in the recent releases. If we can
design perfect interface from the beginning, it's best but usually hard
to design.
Also, custom-scan interface is almost symmetric with existing plan node
structures, so its stability is relatively high, I think.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-02-26 16:23:12 -0500, Andrew Dunstan wrote:

  + if (va-string.len == vb-string.len)
  + {
  + res = memcmp(va-string.val, vb-string.val, va-string.len);
  + if (res == 0  arg)
  + *(bool *) arg = true;
  Should be NULL, not 0.
  
  No, the compiler doesn't like that for int values.
 
 Yes, please disregard, I misread. I think I wanted actually to say that
 the test for arg should be arg != NULL, because we don't usually do
 pointer truth tests (which I personally find odd, but well).

Pointer validity tests seem to be mostly a matter of personal
preference.  I know I sometimes use just if (foo) and other times if
(foo != NULL).  Both idioms are used inconsistently all over the place.
We even have a PointerIsValid() macro.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Alvaro Herrera
Peter Geoghegan wrote:
 On Wed, Feb 26, 2014 at 2:10 PM, Andrew Dunstan and...@dunslane.net wrote:
  new patch attached, change pushed to github.
 
  + /* GUC variables */
  + static bool   pretty_print_var = false;
  + #define SET_PRETTY_PRINT_VAR(x)   ((pretty_print_var) ? \
  +((x) | 
  PrettyPrint) : (x))
 
 I think that this is not a great idea. I think that we should do away
 with the GUC, but keep the function hstore_print() so we can pretty
 print that way. I don't believe that this falls afoul of the usual
 obvious reasons for not varying the behavior of IO routines with a
 GUC, since it only varies whitespace, but it is surely pretty
 questionable to have this GUC's setting vary the output of hstore_out,
 an IMMUTABLE function.

I don't see this in the submitted patch.  What's going on?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Peter Geoghegan
On Wed, Feb 26, 2014 at 5:06 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I think that this is not a great idea. I think that we should do away
 with the GUC, but keep the function hstore_print() so we can pretty
 print that way. I don't believe that this falls afoul of the usual
 obvious reasons for not varying the behavior of IO routines with a
 GUC, since it only varies whitespace, but it is surely pretty
 questionable to have this GUC's setting vary the output of hstore_out,
 an IMMUTABLE function.

 I don't see this in the submitted patch.  What's going on?

I'm working off the Github branch here, as of an hour ago, since I was
under the impression that the patches submitted are merely snapshots
of that (plus I happen to strongly prefer not dealing with patch files
for something this big). Which submitted patch?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 08:09 PM, Peter Geoghegan wrote:

On Wed, Feb 26, 2014 at 5:06 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

I think that this is not a great idea. I think that we should do away
with the GUC, but keep the function hstore_print() so we can pretty
print that way. I don't believe that this falls afoul of the usual
obvious reasons for not varying the behavior of IO routines with a
GUC, since it only varies whitespace, but it is surely pretty
questionable to have this GUC's setting vary the output of hstore_out,
an IMMUTABLE function.

I don't see this in the submitted patch.  What's going on?

I'm working off the Github branch here, as of an hour ago, since I was
under the impression that the patches submitted are merely snapshots
of that (plus I happen to strongly prefer not dealing with patch files
for something this big). Which submitted patch?





It's in the nested hstore patch. I've been splitting this into two 
pieces. See 
http://www.postgresql.org/message-id/530d0646.8020...@dunslane.net for 
the latest hstore piece.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Robert Haas
On Wed, Feb 26, 2014 at 3:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 11:39 AM, Merlin Moncure wrote:
 On Wed, Feb 26, 2014 at 12:05 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 09:57 AM, Merlin Moncure wrote:
 What is not going to be so clear for users (particularly without good
 supporting documentation) is how things break down in terms of usage
 between hstore and jsonb.

 Realistically?  Once we get done with mapping the indexes and operators,
 users who are used to Hstore1 use Hstore2, and everyone else uses jsonb.
  jsonb is nothing other than a standardized syntax interface to hstore2,
 and most users will choose the syntax similar to what they already know
 over learning new stuff.

 The problem is that as of today, they are not done and AFAICT will not
 be for 9.4.

 Well, we plan to push to have the indexes and operators available as an
 extension by the time that 9.4 comes out.

Why can't this whole thing be shipped as an extension?   It might well
be more convenient to have the whole thing packaged as an extension
than to have parts of it in core and parts of it not in core.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.7

2014-02-26 Thread Robert Haas
On Wed, Feb 26, 2014 at 12:29 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-24 17:06:53 -0500, Robert Haas wrote:
 -   heap_page_prune_opt(scan-rs_rd, buffer, RecentGlobalXmin);
 +   if (IsSystemRelation(scan-rs_rd)
 +   || RelationIsAccessibleInLogicalDecoding(scan-rs_rd))
 +   heap_page_prune_opt(scan-rs_rd, buffer, RecentGlobalXmin);
 +   else
 +   heap_page_prune_opt(scan-rs_rd, buffer, 
 RecentGlobalDataXmin);

 Instead of changing the callers of heap_page_prune_opt() in this way,
 I think it might be better to change heap_page_prune_opt() to take
 only the first two of its current three parameters; everybody's just
 passing RecentGlobalXmin right now anyway.

 I've changed stuff this way, and it indeed looks better.

 I am wondering about the related situation of GetOldestXmin()
 callers. There's a fair bit of duplicated logic in the callers, before
 but especially after this patchset. What about adding 'Relation rel'
 parameter instead of `allDbs' and `systable'? That keeps the logic
 centralized and there's been a fair amount of talk about vacuum
 optimizations that could also use it.
 It's a bit sad that that requires including rel.h from procarray.h...

 What do you think? Isolated patch attached.

Seems reasonable to me.

+ * considered, but for non-shared non-shared relations that's not required,

Duplicate word.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Peter Geoghegan
On Wed, Feb 26, 2014 at 6:29 PM, Robert Haas robertmh...@gmail.com wrote:
 Why can't this whole thing be shipped as an extension?   It might well
 be more convenient to have the whole thing packaged as an extension
 than to have parts of it in core and parts of it not in core.

That's a good question. I think having everything in contrib would
make it easier to resolve the disconnect between jsonb and hstore. As
things stand, there is a parallel set of functions and operators for
hstore and jsonb, with the former set much larger than the latter. I'm
not terribly happy with that.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding deeply nested AND/OR trees in the parser

2014-02-26 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
 Really if we wanted to fix
 this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
 they recognized nested ANDs/ORs and flattened them on the spot.  This
 might not be a bad idea, but it's starting to look like more than a quick
 hack patch.

 Reminds me of this work:
 http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_...@mail.gmail.com
 http://www.postgresql.org/message-id/flat/cafj8prdd9qtyoy0cbpoodr-hfj1xambuxwoxazg0kyvtvau...@mail.gmail.com

Oh, I'd forgotten about that thread.  I never particularly liked the patch
as presented: like Robert, I thought it far too complicated.  My
inclination would just be to tweak the parser enough so that a simple list
of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.

The most likely bet for making that happen in an uncomplicated way would
be to alter gram.y's processing: if we had the productions for AND/OR
notice whether their left inputs were already AND/OR clauses, they could
extend the argument lists instead of building nested clauses.  The reason
the proposed patch is so complicated is it's trying to avoid recursing
while handling a fundamentally recursive data structure, and that's just
the hard way to do it.

We do need to look at whether there are any implications for ruleutils
and other places, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Andrew Dunstan


On 02/26/2014 09:43 PM, Peter Geoghegan wrote:

On Wed, Feb 26, 2014 at 6:29 PM, Robert Haas robertmh...@gmail.com wrote:

Why can't this whole thing be shipped as an extension?   It might well
be more convenient to have the whole thing packaged as an extension
than to have parts of it in core and parts of it not in core.

That's a good question. I think having everything in contrib would
make it easier to resolve the disconnect between jsonb and hstore. As
things stand, there is a parallel set of functions and operators for
hstore and jsonb, with the former set much larger than the latter. I'm
not terribly happy with that.




The jsonb set will get larger as time goes on. I don't think either of 
you are thinking very clearly about how we would do this. Extensions 
can't call each other's code. So the whole notion we have here of 
sharing the tree-ish data representation and a lot of the C API would go 
out the window, unless you want to shoehorn jsonb into hstore. Frankly, 
we'll look silly with json as a core type and the more capable jsonb not.


Not to mention that if at this stage people suddenly decide we should 
change direction on a course that has been very publicly discussed over 
quite a considerable period, and for which Teodor and I and others have 
put in a great deal of work, I at least am going to be extremely annoyed 
(note the characteristic Australian used of massive understatement.)


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
 The jsonb set will get larger as time goes on. I don't think either
 of you are thinking very clearly about how we would do this.
 Extensions can't call each other's code.

Yeah, that was puzzling me too.

Agree with the rest of your comments as well.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] define type_transform to new user defined type

2014-02-26 Thread Kyotaro HORIGUCHI
Hello,

 Mohsen SM mohsensoodk...@gmail.com writes:
  I want to create new type that is similar to varchar.
 its size is variable.
 I use CREATE TYPE query.
 I define for that type this functions:
 1-typein
 2-typeoute
 3-typemodify_input
 4-typemodify_output
 5-type_transform
 I can define 1 to 4 functions in CREATE TYPE
 but I can't define type_transform for that type. how did I can define
 type_transform for that type?

type cast came in my mind seeing type transoform among
them. You could do that by using 'CREATE CAST' if this is right.

http://www.postgresql.org/docs/current/static/sql-createcast.html

At Tue, 25 Feb 2014 16:11:46 -0500, Tom Lane wrote
 There's no such thing as a type transform.  There are transforms
 associated with functions ... unfortunately, there's not currently
 any provision for defining those at the SQL level.  You could poke
 an entry into pg_proc.protransform if you're desperate enough.

Just for curiosity but could I have some examples for transforms
associated with functions? I saw a lot of transforms for
clauses, exprs, plan nodes, etc, but I had nothing so grepping
src/backend/* for 'transform'.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-02-26 Thread Noah Misch
On Mon, Feb 03, 2014 at 07:36:22PM +0900, Kyotaro HORIGUCHI wrote:
   create table parent (a int, b int);
   create table child () inherits (parent);
   insert into parent values (1,10);
   insert into child values (2,20);
   select a, b from parent union all select a, b from child;
  
  Mmm. I had the same result. Please let me have a bit more time.
 
 This turned out to be a correct result. The two tables have
 following records after the two INSERTs.
 
 | =# select * from only parent;
 |  1 | 10
 | (1 row)
 | 
 | =# select * from child;
 |  2 | 20
 | (1 row)
 
 Then it is natural that the parent-side in the UNION ALL returns
 following results.
 
 | =# select * from parent;
 |  a | b  
 | ---+
 |  1 | 10
 |  2 | 20
 | (2 rows)
 
 Finally, the result we got has proved not to be a problem.

The first union branch should return two rows, and the second union branch
should return one row, for a total of three.  In any case, I see later in your
mail that you fixed this.  The larger point is that this patch has no business
changing the output rows of *any* query.  Its goal is to pick a more-efficient
plan for arriving at the same answer.  If there's a bug in our current output
for some query, that's a separate discussion from the topic of this thread.

 Second, about the crash in this sql,
 
  select parent from parent union all select parent from parent;
 
 It is ignored whole-row reference (=0) which makes the index of
 child translated-vars list invalid (-1).  I completely ignored it
 in spite that myself referred to before.
 
 Unfortunately ConvertRowtypeExpr prevents appendrels from being
 removed currently, and such a case don't seem to take place so
 often, so I decided to exclude the case.

 + /*
 +  * Appendrels which does whole-row-var 
 conversion cannot be
 +  * removed. ConvertRowtypeExpr can convert only 
 RELs which can
 +  * be referred to using relid.

We have parent and child relids, so it is not clear to me how imposing that
restriction helps us.  I replaced transvars_merge_mutator() with a call to
adjust_appendrel_attrs().  This reduces code duplication, and it handles
whole-row references.  (I don't think the other nodes adjust_appendrel_attrs()
can handle matter to this caller.  translated_vars will never contain join
tree nodes, and I doubt it could contain a PlaceHolderVar with phrels
requiring translation.)

The central design question for this patch seems to be how to represent, in
the range table, the fact that we expanded an inheritance parent despite its
children ending up as appendrel children of a freestanding UNION ALL.  The v6
patch detaches the original RTE from the join tree and clears its inh flag.
This breaks sepgsql_dml_privileges(), which looks for RTE_RELATION with inh =
true and consults selinux concerning every child table.  We could certainly
change the way sepgsql discovers inheritance hierarchies, but nothing clearly
better comes to mind.  I selected the approach of preserving the RTE's inh
flag, removing the AppendRelInfo connecting that RTE to its enclosing UNION
ALL, and creating no AppendRelInfo children for that RTE.  An alternative was
to introduce a new RTE flag, say append.  An inheritance parent under a
UNION ALL would have append = false, inh = true; other inheritance parent RTEs
would have append = true, inh = true; an RTE for UNION ALL itself would have
append = true, inh = false.

  The attached two patches are rebased to current 9.4dev HEAD and
  make check at the topmost directory and src/test/isolation are
  passed without error. One bug was found and fixed on the way. It
  was an assertion failure caused by probably unexpected type
  conversion introduced by collapse_appendrels() which leads
  implicit whole-row cast from some valid reltype to invalid
  reltype (0) into adjust_appendrel_attrs_mutator().
 
 What query demonstrated this bug in the previous type 2/3 patches?
   
   I would still like to know the answer to the above question.
 
 I rememberd after some struggles. It failed during 'make check',
 on the following query in inherit.sql.

 [details]

Interesting.  Today, the parent_reltype and child_reltype fields of
AppendRelInfo are either both valid or both invalid.  Your v6 patch allowed us
to have a valid child_reltype with an invalid parent_reltype.  At the moment,
we can't benefit from just one valid reltype.  I restored the old invariant.

If the attached patch version looks reasonable, I will commit it.


Incidentally, I tried adding an assertion that append_rel_list does not show
one appendrel as a direct child of another.  The following query, off-topic
for the patch at hand, triggered that assertion:

SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
UNION ALL
SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;

-- 
Noah Misch
EnterpriseDB 

Re: [HACKERS] Avoiding deeply nested AND/OR trees in the parser

2014-02-26 Thread Ashutosh Bapat
On Thu, Feb 27, 2014 at 8:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Noah Misch n...@leadboat.com writes:
  On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
  Really if we wanted to fix
  this issue we'd need to hack up transformAExprAnd/transformAExprOr so
 that
  they recognized nested ANDs/ORs and flattened them on the spot.  This
  might not be a bad idea, but it's starting to look like more than a
 quick
  hack patch.

  Reminds me of this work:
 
 http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_...@mail.gmail.com
 
 http://www.postgresql.org/message-id/flat/cafj8prdd9qtyoy0cbpoodr-hfj1xambuxwoxazg0kyvtvau...@mail.gmail.com

 Oh, I'd forgotten about that thread.  I never particularly liked the patch
 as presented: like Robert, I thought it far too complicated.  My
 inclination would just be to tweak the parser enough so that a simple list
 of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.

 The most likely bet for making that happen in an uncomplicated way would
 be to alter gram.y's processing: if we had the productions for AND/OR
 notice whether their left inputs were already AND/OR clauses, they could
 extend the argument lists instead of building nested clauses.  The reason
 the proposed patch is so complicated is it's trying to avoid recursing
 while handling a fundamentally recursive data structure, and that's just
 the hard way to do it.


 We do need to look at whether there are any implications for ruleutils
 and other places, though.


ruleutils should be fine. See code below in ruleutils.c

6615 case AND_EXPR:
6616 if (!PRETTY_PAREN(context))
6617 appendStringInfoChar(buf, '(');
6618 get_rule_expr_paren(first_arg, context,
6619 false, node);
6620 while (arg)
6621 {
6622 appendStringInfoString(buf,  AND );
6623 get_rule_expr_paren((Node *) lfirst(arg),
context,
6624 false, node);
6625 arg = lnext(arg);
6626 }
6627 if (!PRETTY_PAREN(context))
6628 appendStringInfoChar(buf, ')');
6629 break;

Similar code exists for OR_EXPR.

Within the planner, I have seen the quals are always implicitly ANDed
lists, where all ANDs are put into a single list. May be same with OR.

As a side note, the code blocks for AND_EXPR and OR_EXPR are almost same
except words AND and OR. So there is some chance to get rid of some
code duplication here.


 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Hannu Krosing
On 02/26/2014 07:41 PM, Josh Berkus wrote:
 On 02/26/2014 07:02 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.
 Yes: This was the agreement that was struck and is the main reason why
 there are two json types, not one.  JSON does not guarantee field
 ordering as I read the spec and for the binary form ordering is not
 maintained as a concession to using the hstore implementation.
 Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field
 ordering.  
That is strange at least for BSON, as it does not have any nearly as
sophisticated
internal format as hstore - no hash tables or anything, just a binary
serialisation.
It would take an extra effort to *not* keep the order there :)

http://bsonspec.org/#/specification


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Christian Kruse
Hi,

On 26/02/14 13:13, Alvaro Herrera wrote:
 
 There's one thing that rubs me the wrong way about all this
 functionality, which is that we've named it huge TLB pages.  That is
 wrong -- the TLB pages are not huge.  In fact, as far as I understand,
 the TLB doesn't have pages at all.  It's the pages that are huge, but
 those pages are not TLB pages, they are just memory pages.

I didn't think about this, yet, but you are totally right.

 Since we haven't released any of this, should we discuss renaming it to
 just huge pages?

Attached is a patch with the updated documentation (now uses
consistently huge pages) as well as a renamed GUC, consistent wording
(always use huge pages) as well as renamed variables.

Should I create a new commit fest entry for this and delete the old
one? Or should this be done in two patches? Locally in my repo this is
done with two commits, so it would be easy to split that.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cf11306..77c778f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1166,35 +1166,33 @@ include 'filename'
   /listitem
  /varlistentry
 
- varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages
-  termvarnamehuge_tlb_pages/varname (typeenum/type)/term
+ varlistentry id=guc-huge-pages xreflabel=huge_pages
+  termvarnamehuge_pages/varname (typeenum/type)/term
   indexterm
-   primaryvarnamehuge_tlb_pages/ configuration parameter/primary
+   primaryvarnamehuge_pages/ configuration parameter/primary
   /indexterm
   listitem
para
-Enables/disables the use of huge TLB pages. Valid values are
+Enables/disables the use of huge pages. Valid values are
 literaltry/literal (the default), literalon/literal,
 and literaloff/literal.
/para
 
para
-At present, this feature is supported only on Linux. The setting
-is ignored on other systems.
+At present, this feature is supported only on Linux. The setting is
+ignored on other systems when set to literaltry/literal.
+productnamePostgreSQL/productname will
+refuse to start when set to literalon/literal.
/para
 
para
-The use of huge TLB pages results in smaller page tables and
-less CPU time spent on memory management, increasing performance. For
-more details, see
-ulink url=https://wiki.debian.org/Hugepages;the Debian wiki/ulink.
-Remember that you will need at least shared_buffers / huge page size +
-1 huge TLB pages. So for example for a system with 6GB shared buffers
-and a hugepage size of 2kb of you will need at least 3156 huge pages.
+The use of huge pages results in smaller page tables and less CPU time
+spent on memory management, increasing performance. For more details,
+see xref linkend=linux-huge-pages.
/para
 
para
-With varnamehuge_tlb_pages/varname set to literaltry/literal,
+With varnamehuge_pages/varname set to literaltry/literal,
 the server will try to use huge pages, but fall back to using
 normal allocation if that fails. With literalon/literal, failure
 to use huge pages will prevent the server from starting up. With
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bbb808f..7f4a235 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1307,6 +1307,57 @@ echo -1000  /proc/self/oom_score_adj
/para
/note
   /sect2
+
+  sect2 id=linux-huge-pages
+   titleLinux huge pages/title
+
+   para
+Using huge pages reduces overhead when using large contiguous chunks of
+memory, like productnamePostgreSQL/productname does. To enable this
+feature in productnamePostgreSQL/productname you need a kernel
+with varnameCONFIG_HUGETLBFS=y/varname and
+varnameCONFIG_HUGETLB_PAGE=y/varname. You also have to tune the system
+setting varnamevm.nr_hugepages/varname. To estimate the number of
+necessary huge pages start productnamePostgreSQL/productname without
+huge pages enabled and check the varnameVmPeak/varname value from the
+proc filesystem:
+programlisting
+$ userinputhead -1 /path/to/data/directory/postmaster.pid/userinput
+4170
+$ userinputgrep ^VmPeak /proc/4170/status/userinput
+VmPeak:  6490428 kB
+/programlisting
+ literal6490428/literal / literal2048/literal
+ (varnamePAGE_SIZE/varname is literal2MB/literal in this case) are
+ roughly literal3169.154/literal huge pages, so you will need at
+ least literal3170/literal huge pages:
+programlisting
+$ userinputsysctl -w vm.nr_hugepages=3170/userinput
+/programlisting
+Sometimes the kernel is not able to allocate the desired number of huge
+pages, so it might be necessary to 

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-02-26 Thread Christian Kruse
Hi Peter,

thank you for your nice words, much appreciated. I'm sorry that I was
so whiny about this in the last post.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpxVj8SJRDQS.pgp
Description: PGP signature