[HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

2013-11-14 Thread Peter Geoghegan
Attached patch allows pg_stat_statements to store arbitrarily long
query texts, using an external, transparently managed lookaside file.

This is of great practical benefit to certain types of users, who need
to understand the execution costs of queries with associated
excessively long query texts, often due to the fact that the query was
generated by an ORM. This is a common complaint of Heroku customers,
though I'm sure they're not alone there.

As I mentioned on a previous occasion, since query fingerprinting was
added, the query text is nothing more than a way of displaying the
entries to users that are interested. As such, it seems perfectly
reasonable to store those externally, out of shared memory - the
function pg_stat_statements() itself (that the view is defined as a
call to) isn't particularly performance sensitive. Creating a brand
new pg_stat_statements entry is already costly, since it necessitates
an exclusive lock that blocks everything, so the additional cost of
storing the query text when that happens is assumed to be of marginal
concern. Better to have more entries in the first place, so that
doesn't need to happen very frequently - using far less memory per
entry helps the user to increase pg_stat_statements.max in the first
place, making excessive exclusive locking more unlikely in the first
place. Having said all that, the code bends over backwards to make
sure that physical I/O is as unlikely as possible during exclusive
locking. There are numerous tricks employed here where cache pressure
is a concern, that I won't go into here, since it's all well commented
in the patch.

Maybe we should have a warning when eviction occurs too frequently? I
also think that we might want to take another look at making the
normalization logic less strict in terms of differentiating queries
that a reasonable person might consider equivalent. This is obviously
rather fuzzy, but I've had complaints about things like
pg_stat_statements being overly fussy in seeing row and array
comparisons as distinct from each other. This is a discussion for
another thread most probably, but informed by one of the same concerns
- making expensive cache pressure less likely.

This incidentally furthers the case for pg_stat_statements in core
(making it similar to pg_stat_user_functions - not turned on by
default, but easily available, even on busy production systems that
cannot afford a restart and didn't know they needed it until
performance tanked 5 minutes ago). The amount of shared memory now
used (and therefore arguably wasted by having a little bit of shared
memory just in case pg_stat_statements becomes useful) is greatly
reduced. That's really a separate discussion, though, which is why I
haven't done the additional work of integrating pg_stat_statements
into core here. Doing a restart to enable pg_stat_statements is, in my
experience, only acceptable infrequently - restarts are generally to
be avoided at all costs.

I can see a day when the query hash is actually a general query cache
identifier, at which time the query text will also be stored outside
of the pgss hash table proper.

Refactoring
=

I've decided to remove the encoding id from the pgss entry key (it's
now just in the main entry struct) - I don't think it makes sense
anymore. It is clearly no longer true that it's a notational
convenience (and hasn't been since 9.1). Like query texts themselves,
it is something that exists for the sole purpose of displaying
statistics to users, and as such cannot possibly result in incorrectly
failing to differentiate or spuriously differentiating two entries.
Now, I suppose it's true that since we're sometimes directly hashing
query texts, it might matter (e.g. utility statements) assuming that
they could vary. But since encoding invariably comes from database
encoding anyway, and we always differentiate on databaseid to begin
with, I fail to see how that could possibly matter.

I've bumped PGSS_FILE_HEADER, just in case a pg_stat_statements temp
file survives a pg_upgrade. I think that some minor tweaks made by
Noah to pg_stat_statements (as part of commit b560ec1b) ought to have
necessitated doing so anyway, but no matter.

The role of time-series aggregation
=

Over on the min/max pg_stat_statements storage thread, I've stated
that I think the way forward is that third party tools aggregate
deltas fairly aggressively - maybe even as aggressively as every 3
seconds or something. So I'm slightly concerned that I may be
hampering a future tool that needs to aggregate statistics very
aggressively. For that reason, we should probably provide an
alternative function/view that identifies pg_stat_statements entries
by hashid + databaseid + userid only, so any overhead from reading big
query strings from disk with a shared lock held is eliminated.

Thoughts?
-- 
Peter Geoghegan


pg_stat_statements_ext_text.v1.2013_11_14.patch.gz
Description: GNU Zip compressed data

-- 
Sent via 

Re: [HACKERS] nested hstore patch

2013-11-14 Thread Hannu Krosing
On 11/14/2013 01:32 AM, David E. Wheeler wrote:
 On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 I remember strong voices in support of *not* normalising json, so that
 things like

 {a:1,a:true, a:b, a:none}

 would go through the system unaltered, for claimed standard usage of
 json as
 processing instructions. That is as source code which can possibly
 converted
 to JavaScript Object and not something that would come out of
 serialising of
 any existing JavaScript Object.
 My recollection from PGCon was that there was consensus to normalize on 
 the way in --
Great news! I remember advocating this approach in the mailing lists
but having been out-voted based on current real-world usage out there :)
  or at least, if we switched to a binary representation as proposed by 
 Oleg  Teodor, it was not worth the hassle to try to keep it.
Very much agree. For the source code approach I'd recommend
text type with maybe a check that it is possible to convert it to json.



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA

2013-11-14 Thread David Rowley
On Sun, Nov 3, 2013 at 3:18 AM, David Rowley dgrowle...@gmail.com wrote:

 I'm low on ideas on how to improve things much around here for now, but
 for what it's worth, I did create a patch which changes unnecessary calls
 to appendPQExpBuffer() into calls to appendPQExpBufferStr() similar to the
 recent one for appendStringInfo and appendStringInfoString.



Attached is a re-based version of this.

Regards

David Rowley


 Regards

 David Rowley





appendPQExpBufferStr_v0.2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second

2013-11-14 Thread Mika Eloranta
On 13 Nov 2013, at 20:51, Mika Eloranta m...@ohmu.fi wrote:

 Prevent excessive progress reporting that can grow to gigabytes
 of output with large databases.

Same patch as an attachment.

-- 
Mika Eloranta
Ohmu Ltd.  http://www.ohmu.fi/



0001-pg_basebackup-progress-report-max-once-per-second.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that number
 of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting tree
 leaf pages are not vacuumed at all; that's clearly not right.


Fixed. Now separate function handles uncompressed posting lists and
compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing a
 page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


Now this situation is ereported before any change in page.

To handle splitting a non-compressed page, it seems that you're relying on
 the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


Good point. Yes, if we don't handle specially inserting item indexes, I see
no point to do special handling for single TID which is much smaller. In
the attached patch dataCompressLeafPage just reserves space for single TID.

Also, many changes in comments and README.

Unfortunally, I didn't understand what is FIXME about in
ginVacuumEntryPage. So, it's not fixed :)

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tcp_keepalives_idle

2013-11-14 Thread Marko Tiikkaja

On 11/14/13 7:08 AM, Tatsuo Ishii wrote:

It means the connection is idle except for keepalive packets.
We could perhaps just drop the word otherwise, if people find
it confusing.


Wah. I seemed to completely misunderstand what the pharase
says. Thanks for clarification. I agree to drop otherwise.


I had some problem interpreting these explanations as well: 
http://www.postgresql.org/message-id/527a21f1.2000...@joh.to


Compare that to the description in the libpq documentation: Controls 
the number of seconds of inactivity after which TCP should send a 
keepalive message to the server..



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second

2013-11-14 Thread Marko Tiikkaja

On 11/14/13 10:27 AM, Mika Eloranta wrote:

On 13 Nov 2013, at 20:51, Mika Eloranta m...@ohmu.fi wrote:


Prevent excessive progress reporting that can grow to gigabytes
of output with large databases.


Same patch as an attachment.


I can't comment on the usefulness of this patch, but the first hunk in 
progress_report does not conform to the code style guidelines of the 
project.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-11-14 Thread Sameer Thakur
Hello,
Tried to test this patch. Did the following
1. cloned from https://github.com/samthakur74/postgres
2. Applied patch and make install
3. created rolesapp_readonly_role,app2_writer_role
4. Tried createuser -D -S -l -g app_readonly_role,app2_writer_role
test_user got error: createuser: invalid option -- 'g'
5. Tried createuser -D -S -l --roles
app_readonly_role,app2_writer_role test_user. This does not give
error.
6. Confirmed that test_user is created using \du and it has
postgres=# \du
   List of roles
 Role name |   Attributes   |
   Member of
---++---
---
 Sameer| Superuser, Create role, Create DB, Replication | {}
 app2_writer_role  | Cannot login   | {}
 app_readonly_role | Cannot login   | {}
 my_new_user   || {app_reado
nly_role,app2_writer_role}
 test_user || {app_reado
nly_role,app2_writer_role}

7. createuser --help does show  -g, --roles   roles to
associate with this new role

So i think -g option is failing

regards
Sameer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Clang 3.3 Analyzer Results

2013-11-14 Thread Magnus Hagander
On Wednesday, November 13, 2013, Tom Lane wrote:

 Kevin Grittner kgri...@ymail.com javascript:; writes:
  If nobody objects, I'll fix that small memory leak in the
  regression test driver.  Hopefully someone more familiar with
  pg_basebackup will fix the double-free (and related problems
  mentioned by Tom) in streamutil.c.

 Here's a less convoluted (IMHO) approach to the password management logic
 in streamutil.c.  One thing I really didn't care for about the existing
 coding is that the loop-for-password included all the rest of the
 function, even though there's no intention to retry for any purpose except
 collecting a password.  So I moved up the bottom of the loop.  For ease of
 review, I've not reindented the code below the new loop bottom, but would
 do so before committing.

 Any objections to this version?

 Nope, looks good to me.

That code was originally stolen from psql, and then whacked around a
number of times.  The part about looping and passwords, for example, is in
startup.c in psql as well. We probably want to fix it there as well (even
if it doesn't have the same problem, it has the same general design). Or
perhaps even put that function somewhere shared between the two?

It's also in pg_dump/pg_backup_db.c, there's a version of it in
pg_dumpall.c, etc. Which I think is a good argument for fixing them all by
sharing the code somewhere? In fact, we already have some in
script/common.c - but it's only used by the tools that are in script/.

//Magnus



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH 1/2] SSL: GUC option to prefer server cipher order

2013-11-14 Thread Magnus Hagander
On Thursday, November 7, 2013, Marko Kreen wrote:

 On Wed, Nov 06, 2013 at 09:57:32PM -0300, Alvaro Herrera wrote:
  Marko Kreen escribió:
 
   By default OpenSSL (and SSL/TLS in general) lets client cipher
   order take priority.  This is OK for browsers where the ciphers
   were tuned, but few Postgres client libraries make cipher order
   configurable.  So it makes sense to make cipher order in
   postgresql.conf take priority over client defaults.
  
   This patch adds setting 'ssl_prefer_server_ciphers' which can be
   turned on so that server cipher order is preferred.
 
  Wouldn't it make more sense to have this enabled by default?

 Well, yes.  :)

 I would even drop the GUC setting, but hypothetically there could
 be some sort of backwards compatiblity concerns, so I added it
 to patch and kept old default.  But if noone has strong need for it,
 the setting can be removed.


I think the default behaviour should be the one we recommend (which would
be to have the server one be preferred). But I do agree with the
requirement to have a GUC to be able to  remove it - even though I don't
like the idea of more GUCs. But making it a compile time option would make
it the same as not having one...

//Magnus



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Logging WAL when updating hintbit

2013-11-14 Thread Florian Weimer

On 11/14/2013 07:02 AM, Sawada Masahiko wrote:


I attached patch adds new wal_level 'all'.


Shouldn't this be a separate setting?  It's useful for storage which 
requires rewriting a partially written sector before it can be read again.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-11-14 Thread Kyotaro HORIGUCHI
Hello,

 When I read it again and try to relate, I get your point. Actually true,
 hashes must always be performed as last option (if that is what you too
 meant) and if there are few other operations they must be the last one to
 be performed especially after sorting/grouping. Hashes must somehow make
 use of already sorted data (I think this something even you indicated)

Yes, some 'hash'es could preserve order selecting such a function
for hash function. But at least PostgreSQL's 'HashAggregation'
uses not-order-preserving function as hash function. So output
cannot preserve input ordering.

 I will do that if I get a DB2 system or Oracle system running. I will try
 to replicate the same 2 test cases and share the plan. One thing which I am
 sure is, the below part of the plan
 
 QUERY PLAN | Subquery Scan on __unnamed_subquery_0
  (cost=12971.39..16964.99 rows=614 width=43) (actual
 time=2606.075..2953.937 rows=558 loops=1)
 
 would be generated as RID scan in DB2 (which I have seen to perform better
 than normal subquery scans in DB2).

DB2's document says it is used for 'index ORing' corresponds OR
or IN ops, which seems to be a relative to BitmapOr of
PostgreSQL, perhaps not to HashAggregates/SemiJoin.

I tried to imagin the plan for the group_by case with repeated
index scan and merging..

 select student_name
 from student_score
 where (course,score) in (select course,max(score)
  from student_score group by course);

Taking the advantage that the cardinarity of course is 8, this
query could be transformed into 8 times of index scan and
bitmaping.

With hypothetical plan node LOOP, and BitmapScanAdd the plan
could be,

| BitmapHeapScan (rows = 155, loops = 1)
|  - LOOP 
| ON Subquery (select distinct course from student_course) as c0
| - BitmapScanAdd (loops = 8)
|BitmapCond: (student_score.score = x)
|   - Limit (rows = 1, loops = 8) AS x
| - Unique (rows = 1, loops = 8)
|   - IndexScan using idx_score on student_course (rows = 1, loops = 8)
|  Filter (student_course.course = c0)

I suppose this is one possibility of what DB2 is doing. If DB2
does the same optimization for ranking  1 with the dense_rank()
case, this plan might be like this,

| BitmapHeapScan (rows = 133, loops = 1)
|  - LOOP 
| ON Subquery (select distinct course from student_course) as c0
| - BitmapScanAdd (loops = 8)
|BitmapCond: (student_score.score = x)
|   - Limit (rows = 1, loops = 8) AS x
| - Unique (rows = 2, loops = 8)
|   - IndexScan using idx_score on student_course (rows = 18,loops= 8)
|  Filter (student_course.course = c0)

Both plans surely seem to be done shortly for relatively small
n's and number of courses.

On the other hand, using semijoin as PostgreSQL does, creating
HashAggregate storing nth place score for every course requires
some memory to work on for each course.

| Hash Semi Join
|   Hash Cond: (a.course = b.course and a.score = b.score)
|  - SeqScan on student_score as a
|  - Hash
|   - HashAggregatesFunc (rows = 8)
|  Key = course, func = rankn(dense_rank(), n, key, val)
|- SeqScan on student_score (rows = 122880)

Where, rankn() must keep socres down to nth rank and emits nth
score as final value. I don't get more generic form for this
mechanism right now, and the value to do in this specific manner
seems not so much..


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-11-14 Thread Sameer Thakur
 1. cloned from https://github.com/samthakur74/postgres
Sorry. cloned from https://github.com/postgres/postgres
regards
Sameer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that number
 of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting tree
 leaf pages are not vacuumed at all; that's clearly not right.


 Fixed. Now separate function handles uncompressed posting lists and
 compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing a
 page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


 Now this situation is ereported before any change in page.

 To handle splitting a non-compressed page, it seems that you're relying on
 the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


 Good point. Yes, if we don't handle specially inserting item indexes, I
 see no point to do special handling for single TID which is much smaller.
 In the attached patch dataCompressLeafPage just reserves space for single
 TID.

 Also, many changes in comments and README.

 Unfortunally, I didn't understand what is FIXME about in
 ginVacuumEntryPage. So, it's not fixed :)


Sorry, I posted buggy version of patch. Attached version is correct.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-14.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve code in tidbitmap.c

2013-11-14 Thread Etsuro Fujita
I'd like to do the complementary explanation of this.

In tbm_union_page() and tbm_intersect_page() in tidbitmap.c, WORDS_PER_PAGE
is used in the scan of a lossy chunk, instead of WORDS_PER_CHUNK, where
these macros are defined as:

/* number of active words for an exact page: */
#define WORDS_PER_PAGE  ((MAX_TUPLES_PER_PAGE - 1) / BITS_PER_BITMAPWORD +
1)
/* number of active words for a lossy chunk: */
#define WORDS_PER_CHUNK  ((PAGES_PER_CHUNK - 1) / BITS_PER_BITMAPWORD + 1)

Though the use of WORDS_PER_PAGE in the scan of a lossy chunk is logically
correct since these macros implicitly satisfy that WORDS_PER_CHUNK 
WORDS_PER_PAGE, I think WORDS_PER_CHUNK should be used in the scan of a
lossy chunk for code readability and maintenance.  So, I submitted a patch
working in such a way in an earlier email.

I think that, as a secondary effect of the patch, the scan would be
performed a bit efficiently.

I'll add the patch to the 2013-11 CF.  Any comments are welcome.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-11-14 Thread Haribabu kommi
Please find attached the patch, for adding a new option for pg_basebackup, to 
specify a different directory for pg_xlog.


Design

A new option: xlogdir is added to the list of options for pg_basebackup. The 
new option is not having an equivalent short option letter.
This option will allow the user to specify a different directory for pg_xlog.

The format for specifying a different directory will be: 
--xlogdir=/path/to/xlog/directory

eg:
pg_basebackup --xlogdir=/home/pg/xlog -D ../dataBaseBackUp


When user specifies a xlog directory, it creates a symbolic link from the 
default directory to the user specified directory.
User can give only absolute path for the xlog directory. This option will work 
only if the format for the backup is 'plain'.

Please provide your feedback / suggestions

Regards,
Hari babu.



UserSpecifiedxlogDir.patch
Description: UserSpecifiedxlogDir.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread KONDO Mitsumasa
(2013/10/21 20:17), KONDO Mitsumasa wrote:
 (2013/10/18 22:21), Andrew Dunstan wrote:
 If we're going to extend pg_stat_statements, even more than min and max
 I'd like to see the standard deviation in execution time.
 OK. I do! I am making some other patches, please wait more!
I add stddev_time and fix some sources.

Psql result of my latest patch is under following.

userid  | 10
dbid| 16384
query   | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE
tid = ?;
calls   | 74746
total_time  | 1094.291998
min_time| 0.007
max_time| 15.091
stddev_time | 0.100439187720684
rows| 74746
shared_blks_hit | 302346
shared_blks_read| 6
shared_blks_dirtied | 161
shared_blks_written | 0
local_blks_hit  | 0
local_blks_read | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read  | 0
temp_blks_written   | 0
blk_read_time   | 0
blk_write_time  | 0


I don't think a lot that order of columns in this table. If have any idea, 
please
send me. And thanks for a lot of comments and discussion, I am going to refer to
these for not only this patch but also development of pg_statsinfo and
pg_stats_reporter:-)

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread KONDO Mitsumasa
Oh! Sorry...
I forgot to attach my latest patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index 000..929d623
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
@@ -0,0 +1,51 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(
+OUT userid oid,
+OUT dbid oid,
+OUT query text,
+OUT calls int8,
+OUT total_time float8,
+OUT min_time float8,
+OUT max_time float8,
+OUT stddev_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
+
+/* New Function */
+CREATE FUNCTION pg_stat_statements_reset_time()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
new file mode 100644
index 000..d590acc
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
@@ -0,0 +1,52 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
+
+-- Register functions.
+CREATE FUNCTION pg_stat_statements_reset()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION pg_stat_statements_reset_time()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION pg_stat_statements(
+OUT userid oid,
+OUT dbid oid,
+OUT query text,
+OUT calls int8,
+OUT total_time float8,
+OUT min_time float8,
+OUT max_time float8,
+OUT stddev_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+-- Register a view on the function for ease of use.
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
+
+-- Don't want this to be available to non-superusers.
+REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index e8aed61..5c63940 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,8 +4,10 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
-	pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.2.sql \
+   pg_stat_statements--1.0--1.1.sql \
+   pg_stat_statements--1.1--1.2.sql \
+   pg_stat_statements--unpackaged--1.0.sql
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
index 5be281e..5662273 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
@@ -1,7 +1,7 @@
 /* contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql */
 
 -- complain if script is sourced in psql, rather than via ALTER EXTENSION
-\echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.1' to load this file. \quit
+\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit
 
 /* First we have to remove them from the extension */
 ALTER EXTENSION pg_stat_statements DROP VIEW 

Re: [HACKERS] nested hstore patch

2013-11-14 Thread Andrew Dunstan


On 11/14/2013 03:21 AM, Hannu Krosing wrote:

On 11/14/2013 01:32 AM, David E. Wheeler wrote:

On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote:


I remember strong voices in support of *not* normalising json, so that
things like

{a:1,a:true, a:b, a:none}

would go through the system unaltered, for claimed standard usage of
json as
processing instructions. That is as source code which can possibly
converted
to JavaScript Object and not something that would come out of
serialising of
any existing JavaScript Object.

My recollection from PGCon was that there was consensus to normalize on
the way in --

Great news! I remember advocating this approach in the mailing lists
but having been out-voted based on current real-world usage out there :)

  or at least, if we switched to a binary representation as proposed by
Oleg  Teodor, it was not worth the hassle to try to keep it.

Very much agree. For the source code approach I'd recommend
text type with maybe a check that it is possible to convert it to json.




I don't think you and David are saying the same thing. AIUI he wants one 
JSON type and is prepared to discard text preservation (duplicate keys 
and key order). You want two json types, one of which would feature text 
preservation.


Correct me if I'm wrong.

cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread Claudio Freire
On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 I create a patch that is improvement of disk-read and OS file caches. It can
 optimize kernel readahead parameter using buffer access strategy and
 posix_fadvice() in various disk-read situations.

 In general OS, readahead parameter was dynamically decided by disk-read
 situations. If long time disk-read was happened, readahead parameter becomes 
 big.
 However it is based on experienced or heuristic algorithm, it causes waste
 disk-read and throws out useful OS file caches in some case. It is bad for
 disk-read performance a lot.

It would be relevant to know which kernel did you use for those tests.

@@ -677,6 +677,7 @@ mdread(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum,
  errmsg(could not seek to block %u in file \%s\: %m,
 blocknum, FilePathName(v-mdfd_vfd;

+BufferHintIOAdvise(v-mdfd_vfd, buffer, BLCKSZ, strategy);
 nbytes = FileRead(v-mdfd_vfd, buffer, BLCKSZ);

 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,

A while back, I tried to use posix_fadvise to prefetch index pages. I
ended up finding out that interleaving posix_fadvise with I/O like
that severly hinders (ie: completely disables) the kernel's read-ahead
algorithm.

How exactly did you set up those benchmarks? pg_bench defaults?

pg_bench does not exercise heavy sequential access patterns, or long
index scans. It performs many single-page index lookups per
transaction and that's it. You may want to try your patch with more
real workloads, and maybe you'll confirm what I found out last time I
messed with posix_fadvise. If my experience is still relevant, those
patterns will have suffered a severe performance penalty with this
patch, because it will disable kernel read-ahead on sequential index
access. It may still work for sequential heap scans, because the
access strategy will tell the kernel to do read-ahead, but many other
access methods will suffer.

Try OLAP-style queries.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Heikki Linnakangas

On 14.11.2013 14:38, David Rowley wrote:

I've just completed some more benchmarking of this. I didn't try dropping
the threshold down to 2 or 0 but I did tests at the cut over point and
really don't see much difference in performance between the list at 32 and
the hashtable at 33 sequences. The hash table version excels in the 16000
sequence test in comparison to the unpatched version.

Times are in milliseconds of the time it took to call currval() 10
times for 1 sequence.
  Patched Unpatched increased by  1 in cache 1856.452 1844.11 -1%  32 in
cache 1841.84 1802.433 -2%  33 in cache 1861.558  not tested N/A  16000 in
cache 1963.711 10329.22 426%


If I understand those results correctly, the best case scenario with the 
current code takes about 1800 ms. There's practically no difference with 
N = 32, where N is the number of sequences touched. The hash table 
method also takes about 1800 ms when N=33. The performance of the hash 
table is O(1), so presumably we can extrapolate from that that it's the 
same for any N.


I think that means that we should just completely replace the list with 
the hash table. The difference with a small N is lost in noise, so 
there's no point in keeping the list as a fast path for small N. That'll 
make the patch somewhat simpler.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Andres Freund
Hi,

On 2013-11-13 22:55:43 +1300, David Rowley wrote:
 Here http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.us there
 was some talk about init_sequence being a bottleneck when many sequences
 are used in a single backend.
 
 The attached I think implements what was talked about in the above link
 which for me seems to double the speed of a currval() loop over 3
 sequences. It goes from about 7 seconds to 3.5 on my laptop.

I think it'd be a better idea to integrate the sequence caching logic
into the relcache. There's a comment about it:
 * (We can't
 * rely on the relcache, since it's only, well, a cache, and may decide to
 * discard entries.)
but that's not really accurate anymore. We have the infrastructure for
keeping values across resets and we don't discard entries.

Since we already do a relcache lookup for every sequence manipulation
(c.f. init_sequence()) relying on it won't increase, but rather decrease
the overhead.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I think it'd be a better idea to integrate the sequence caching logic
 into the relcache. There's a comment about it:
  * (We can't
  * rely on the relcache, since it's only, well, a cache, and may decide to
  * discard entries.)
 but that's not really accurate anymore. We have the infrastructure for
 keeping values across resets and we don't discard entries.

We most certainly *do* discard entries, if they're not open when a cache
flush event comes along.

I suppose it'd be possible to mark a relcache entry for a sequence
as locked-in-core, but that doesn't attract me at all.  A relcache
entry is a whole lot larger than the amount of state we really need
to keep for a sequence.

One idea is to have a hashtable for the sequence-specific data,
but to add a link field to the relcache entry that points to the
non-flushable sequence hashtable entry.  That would save the second
hashtable lookup as long as the relcache entry hadn't been flushed
since last use, while not requiring any violence to the lifespan
semantics of relcache entries.  (Actually, if we did that, it might
not even be worth converting the list to a hashtable?  Searches would
become a lot less frequent.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Andres Freund
On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I think it'd be a better idea to integrate the sequence caching logic
  into the relcache. There's a comment about it:
   * (We can't
   * rely on the relcache, since it's only, well, a cache, and may decide to
   * discard entries.)
  but that's not really accurate anymore. We have the infrastructure for
  keeping values across resets and we don't discard entries.
 
 We most certainly *do* discard entries, if they're not open when a cache
 flush event comes along.

What I was aiming at is that we don't discard them because of a limited
cache size. I don't think it means much that we flush the entry when
it's changed but not referenced.

 I suppose it'd be possible to mark a relcache entry for a sequence
 as locked-in-core, but that doesn't attract me at all.  A relcache
 entry is a whole lot larger than the amount of state we really need
 to keep for a sequence.

But effectively that's what already happens unless either somebody else
does an ALTER SEQUENCE or sinval overflow happened, right? So in
production workloads we already will keep both around since hopefully
neither altering a sequence nor sinval overflows should be a frequent
thing.

 One idea is to have a hashtable for the sequence-specific data,
 but to add a link field to the relcache entry that points to the
 non-flushable sequence hashtable entry.  That would save the second
 hashtable lookup as long as the relcache entry hadn't been flushed
 since last use, while not requiring any violence to the lifespan
 semantics of relcache entries.  (Actually, if we did that, it might
 not even be worth converting the list to a hashtable?  Searches would
 become a lot less frequent.)

That's not a bad idea if we decide not to integrate them. And I agree,
there's not much need to have a separate hashtable in that case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Clang 3.3 Analyzer Results

2013-11-14 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 That code was originally stolen from psql, and then whacked around a
 number of times.  The part about looping and passwords, for example, is in
 startup.c in psql as well. We probably want to fix it there as well (even
 if it doesn't have the same problem, it has the same general design). Or
 perhaps even put that function somewhere shared between the two?

 It's also in pg_dump/pg_backup_db.c, there's a version of it in
 pg_dumpall.c, etc. Which I think is a good argument for fixing them all by
 sharing the code somewhere? In fact, we already have some in
 script/common.c - but it's only used by the tools that are in script/.

Hm, maybe, but where?  It's inappropriate for libpgcommon (we don't
want that calling libpq), so I'm not real sure what to do with it.
Also it's not clear to me that all these tools would have the same
requirements for the non-password parameters for the connection request.

BTW, I realized while fooling with this that although the code looks like
it's intended to iterate till a correct password is obtained, actually it
cannot prompt more than once, because of the way PQconnectionNeedsPassword
is coded.  Therefore, the double free that clang is worried about can't
really happen.  It's still worth fixing IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
 We most certainly *do* discard entries, if they're not open when a cache
 flush event comes along.

 What I was aiming at is that we don't discard them because of a limited
 cache size. I don't think it means much that we flush the entry when
 it's changed but not referenced.

Well, I don't want non-user-significant events (such as an sinval queue
overrun) causing sequence state to get discarded.  We would get bug
reports about lost sequence values.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Garick Hamlin
On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote:
 On Nov 13, 2013, at 4:45 PM, Andrew Dunstan and...@dunslane.net wrote:
 
  It should be a pretty-printing function option, perhaps, but not core to 
  the type itself, IMO.
  
  I don't in the least understand how it could be a pretty printing option.
  If we move to a binary rep using the hstore stuff order will be destroyed
  and not stored anywhere, and duplicate keys will be lost. Once that's done,
  how would a pretty print option restore the lost info?
 
 I meant ordering the keys, usually in lexicographic order. I agree that 
 preserving order is untenable.

There is a canonical form.

http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00

A Canonical form would be very useful.  Thats a bit trickier than sorting the 
keys and I don't know there is an accepted canonical form for json yet that
can represent all json documents.  (The canonical form is not the pretty form, 
but I think the key ordering should be the same.)

It might be nice to have a more general canonical form if one emerges from 
somewhere that could encode any json.  Since without something like this,
hashing can only be well specified for the 'sensible subset of json' used in
security protocols.

Garick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] ecpg: Split off mmfatal() from mmerror()

2013-11-14 Thread Michael Meskes
On Tue, Nov 12, 2013 at 10:22:20PM -0500, Peter Eisentraut wrote:
 Similar to recent pg_upgrade changes
 (https://commitfest.postgresql.org/action/patch_view?id=1216), here is a
 patch to separate the terminating and nonterminating variants of
 mmerror() in ecpg.
 ...

Haven't tried it, but it sure looks good to me. Feel free to commit.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread Andres Freund
On 2013-11-14 09:47:18 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
  We most certainly *do* discard entries, if they're not open when a cache
  flush event comes along.
 
  What I was aiming at is that we don't discard them because of a limited
  cache size. I don't think it means much that we flush the entry when
  it's changed but not referenced.
 
 Well, I don't want non-user-significant events (such as an sinval queue
 overrun) causing sequence state to get discarded.  We would get bug
 reports about lost sequence values.

But we can easily do as you suggest and simply retain the entry in that
case. I am just not seeing the memory overhead argument as counting much
since we don't protect against it in normal operation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Somebody broke \d on indexes

2013-11-14 Thread Tom Lane
In HEAD:

regression=# \d tenk1_thous_tenthous
ERROR:  column i.indisidentity does not exist
LINE 4: i.indisidentity,
^

This works fine in released versions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 12:09 AM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should
 require hstore to do things like build
 Also, json_object is pretty weird to me, I'm not sure I see the
 advantage of a new serialization format, and I don't agree with the
 statement but it is the caller's reponsibility to ensure that keys
 are not repeated..
 This is pretty standard in the programming languages I know of which use
 JSON.

 I think the caller should have no such
 responsibility.  Keys should be able to repeated.
 Apparently your experience with using JSON in practice has been fairly
 different from mine; the projects I work on, the JSON is being
 constantly converted back and forth to hashes and dictionaries, which
 means that ordering is not preserved and keys have to be unique (or
 become unique within one conversion cycle).  I think, based on the
 language of the RFC and common practice, that it's completely valid for
 us to require unique keys within JSON-manipulation routines.
 Common practice?  The internet is littered with complaints about
 documents being spontaneously re-ordered and or de-duplicated in
 various stacks.  Other stacks provide mechanisms for explicit key
 order handling (see here: http://docs.python.org/2/library/json.html).
   Why do you think they did that?

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.
 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.

 Maybe call it jsdoc or jdoc or jsobj or somesuch.

This is exactly what needs to be done, full stop (how about: hstore).
It really comes down to this: changing the serialization behaviors
that have been in production for 2 releases (three if you count the
extension) is bad enough, but making impossible some legal json
constructions which are currently possible is an unacceptable
compatibility break.  It's going to break applications I've currently
put into production with no clear workaround.  This is quite frankly
not ok and and I'm calling foul.  The RFC may claim that these
constructions are dubious but that's irrelevant.  It's up to the
parser to decide that and when serializing you are not in control of
the parser.

Had the json type been stuffed into an extension, there would be a
clearer path to get to where you want to go since we could have walled
off the old functionality and introduced side by side API calls.  As
things stand now, I don't see a clean path to do that.

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.

It's not just a matter of performance.  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus j...@agliodbs.com wrote:
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.

That's not very convincing.  What *exactly* is impossible and why to
you think it justifies breaking compatibility with current
applications?   The way forward seems pretty straightforward: given
that hstore is getting nesting power and is moving closer to the json
way of doing things it is essentially 'binary mode json'.  I'm ok with
de-duplication and key ordering when moving into that structure since
it's opt in and doesn't break the serialization behaviors we have
today.  If you want to go further and unify the types then you have to
go through the design work to maintain compatibility.

Furthermore, I bet the performance argument isn't so clear cut either.
 The current json type is probably faster at bulk serialization
precisely because you *dont* need to deduplicate and reorder keys: the
serialization operates without context.  It will certainly be much
better for in place manipulations but it's not nearly as simple as you
are making it out to be.

merlin


-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Somebody broke \d on indexes

2013-11-14 Thread Andres Freund
On 2013-11-14 09:52:11 -0500, Tom Lane wrote:
 In HEAD:
 
 regression=# \d tenk1_thous_tenthous
 ERROR:  column i.indisidentity does not exist
 LINE 4: i.indisidentity,
 ^

That's me. At some point indisidentity was renamed to indisreplident.

Patch attached (also renaming a variable that didn't cause problems but
wasn't named consistently anymore).

Shouldn't we have at least one \d of an index in the regression tests
somewhere? Not that that excuses stupid mitakes, but it'd be helpful
nonetheless.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 25fec2b..ceda13e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1611,9 +1611,9 @@ describeOneTableDetails(const char *schemaname,
 		  false AS condeferrable, false AS condeferred,\n);
 
 		if (pset.sversion = 90400)
-			appendPQExpBuffer(buf, i.indisidentity,\n);
+			appendPQExpBuffer(buf, i.indisreplident,\n);
 		else
-			appendPQExpBuffer(buf, false AS indisidentity,\n);
+			appendPQExpBuffer(buf, false AS indisreplident,\n);
 
 		appendPQExpBuffer(buf,   a.amname, c2.relname, 
 	  pg_catalog.pg_get_expr(i.indpred, i.indrelid, true)\n
@@ -1638,7 +1638,7 @@ describeOneTableDetails(const char *schemaname,
 			char	   *indisvalid = PQgetvalue(result, 0, 3);
 			char	   *deferrable = PQgetvalue(result, 0, 4);
 			char	   *deferred = PQgetvalue(result, 0, 5);
-			char	   *indisidentity = PQgetvalue(result, 0, 6);
+			char	   *indisreplident = PQgetvalue(result, 0, 6);
 			char	   *indamname = PQgetvalue(result, 0, 7);
 			char	   *indtable = PQgetvalue(result, 0, 8);
 			char	   *indpred = PQgetvalue(result, 0, 9);
@@ -1670,7 +1670,7 @@ describeOneTableDetails(const char *schemaname,
 			if (strcmp(deferred, t) == 0)
 appendPQExpBuffer(tmpbuf, _(, initially deferred));
 
-			if (strcmp(indisidentity, t) == 0)
+			if (strcmp(indisreplident, t) == 0)
 appendPQExpBuffer(tmpbuf, _(, replica identity));
 
 			printTableAddFooter(cont, tmpbuf.data);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody broke \d on indexes

2013-11-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-11-14 09:52:11 -0500, Tom Lane wrote:
 In HEAD:
 
 regression=# \d tenk1_thous_tenthous
 ERROR:  column i.indisidentity does not exist
 LINE 4: i.indisidentity,
 ^

 That's me. At some point indisidentity was renamed to indisreplident.

 Patch attached (also renaming a variable that didn't cause problems but
 wasn't named consistently anymore).

Ah, thanks, will commit.

 Shouldn't we have at least one \d of an index in the regression tests
 somewhere? Not that that excuses stupid mitakes, but it'd be helpful
 nonetheless.

Seems like a good idea, will add one of those too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 12:20 AM, Josh Berkus wrote:
 Merlin,


 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.

 XML went through the same thing, which is part of how we got a bunch of
 incompatible dialects of XML.

 Now, your use case does show us that there's a case to be made for still
 having text JSON even after we have binary JSON. 
text-json could easily be a domain (text + check that it is convertible
to json)

maybe it is even possible to teach pg_upgrade to do this automatically
  There's a strong simplicity argument against that, though ...
I think it confuses most people, similar to how storing 1+1 as
processing instructions instead of just evaluationg it and storing 2 :)

OTOH we are in this mess now and have to solve the backwards
compatibility somehow.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 01:42 AM, Andrew Dunstan wrote:

 On 11/13/2013 07:01 PM, Hannu Krosing wrote:

 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.


 Well, that's two voices for that course of action.
I am not really for it (I would have liked to have a
json_object/json_structure instead of
json_string as the meaning of json) but I think there is quite strong
argument
for not breaking backwards compatibility.

 Interesting that I don't think I heard a single voice for this either
 at pgCon or pgOpen,
I attended neither, but I did voice my preferences for _not_ having the
json-as-source-code
type on the mailing lists during previous json discussions.

 although I spent large amounts of time at both talking to people about
 Json, so I'd be interested to hear more voices.

 It would actually simplify things in a way if we do that - we've been
 working on
 a way of doing this that wouldn't upset pg_upgrade. This would render
 that effort unnecessary.
I wonder how hard it would be to rename current json to json_source and
have a new
nested-hstore based json ?


 However it will complicate things for users who will have to choose
 between the data types,
 and function authors who will possibly have to write versions of
 functions to work with both types.
You mostly want the functions for json-object type.

This is supported by the fact that current functions on json-source
treat it as json-object (for example key lookup gives you the value
of latest key and not a list of all matching key values).

You may want some new functions on json-source
(maybe json_source_enumerate_key_values(json, key))
but the current ones are really for json-object.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).

yeah. hm. that's a good point.

Maybe there's a middle ground here: I bet the compatibility issues
would be minimized to an acceptable level if the 'xxx_to_json'
functions maintained their current behaviors; they would construct the
json type in a special internal mode that would behave like the
current type does.   In other words, the marshalling into binary
structure could happen when:

*) stored do a column in a table
*) when any modifying routine is called, updating a key, value, etc
*) manually via a function

but not at cast time.  This preserves compatibility for the important
points and allows serialization of structures that are difficult with
the binary mode variant.

merln


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ideas of printing out the alternative paths

2013-11-14 Thread Zhan Li
When searching all the possible paths of executing a query, the optimizer
finds and saves the cheapest paths for the top level rel. I'd like to check
out all the paths the optimizer has ever considered, which I believe, are
stored in the pathlist of the top level rel. But I do not have an idea of
how to print out these paths to see them visually. Does anyone have an
idea how I can achieve this?

Thanks,
Zhan


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 04:07 PM, Merlin Moncure wrote:
 On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 I guess we should not replace current JSON type with hstore based
 one, but add something json-like based on nested hstore instead.

 Maybe call it jsdoc or jdoc or jsobj or somesuch.
 This is exactly what needs to be done, full stop (how about: hstore).
hstore has completely different i/o formats and thus has similar
backwards compatibility problems.
 It really comes down to this: changing the serialization behaviors
It is really not serialisation behaviours as there is nothing you
can sensibly serialise to have repeated keys.

I agree that you can generate such JSON which would be valid
input tu any json parser, but no JavaScript Object which really serializes
to such JSON.
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  
we should have disallowed this from the beginning and should
have encourages using text as storage for JavaScript source code.
 It's going to break applications I've currently
 put into production with no clear workaround.  
we could rename the old json type during pg_upgrade, but this
would likely break at least implicit casts in functions.
 This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
You could choose a sane serializer ;)

The main argument here is still weather json is source
code or serialization result for JavaScript Object (Notation).

 Had the json type been stuffed into an extension, there would be a
 clearer path to get to where you want to go since we could have walled
 off the old functionality and introduced side by side API calls.  As
 things stand now, I don't see a clean path to do that.

 I use pg/JSON all over the place.  In several cases I have to create
 documents with ordered keys because the parser on the other side wants
 them that way -- this is not a hypothetical argument.  
But one could argue that this is not json either but rather some
json-like input format for special parsers.

Current recommendation is to use text for these kinds of things.

 The current
 json serialization API handles that just fine and the hstore stuff
 coming down the pike will not.  I guess that's a done deal based on
 'performance'.  I'm clearly not the only one to have complained about
 this though.
 It's not just a matter of performance.  It's the basic conflict of
 JSON as document format vs. JSON as data storage.  For the latter,
 unique, unordered keys are required, or certain functionality isn't
 remotely possible: indexing, in-place key update, transformations, etc.
All these would be possible if we redefined json as another notation
for XML instead of string representation of JavaScript Object :)

And things could really be in-place only inside pl/language functions,
as PostgreSQL is still MVCC.

What should be faster is access to nested values, though I suspect
that it is not significantly faster unless you have very large json
documents.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2013-11-14 Thread Hannu Krosing
On 11/14/2013 01:47 PM, Andrew Dunstan wrote:

 On 11/14/2013 03:21 AM, Hannu Krosing wrote:
 On 11/14/2013 01:32 AM, David E. Wheeler wrote:
 On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com
 wrote:

 I remember strong voices in support of *not* normalising json, so that
 things like

 {a:1,a:true, a:b, a:none}

 would go through the system unaltered, for claimed standard usage of
 json as
 processing instructions. That is as source code which can possibly
 converted
 to JavaScript Object and not something that would come out of
 serialising of
 any existing JavaScript Object.
 My recollection from PGCon was that there was consensus to normalize on
 the way in --
 Great news! I remember advocating this approach in the mailing lists
 but having been out-voted based on current real-world usage out
 there :)
   or at least, if we switched to a binary representation as proposed by
 Oleg  Teodor, it was not worth the hassle to try to keep it.
 Very much agree. For the source code approach I'd recommend
 text type with maybe a check that it is possible to convert it to json.



 I don't think you and David are saying the same thing. AIUI he wants
 one JSON
 type and is prepared to discard text preservation (duplicate keys and
 key order).
 You want two json types, one of which would feature text preservation.
I actually *want* the same thing that David wants, but I think that
Merlin has
valid concerns about backwards compatibility.

If we have let this behaviour in, it is not nice to break several uses
of it now.

If we could somehow turn old json into a text domain with json syntax
check
(which it really is up to 9.3) via pg_upgrade that would be great.

It would be the required for pg_dump to have some swicth to output
different typename in CREATE TABLE and similar.

 Correct me if I'm wrong.

 cheers

 andrew






-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 05:06 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).
 yeah. hm. that's a good point.

 Maybe there's a middle ground here: I bet the compatibility issues
 would be minimized to an acceptable level if the 'xxx_to_json'
 functions maintained their current behaviors; they would construct the
 json type in a special internal mode that would behave like the
 current type does.   
Do you have any xxx_to_json usage which can generate a field with
multiple equal keys ?

Or is it just about preserving order ?
 In other words, the marshalling into binary
 structure could happen when:

 *) stored do a column in a table
 *) when any modifying routine is called, updating a key, value, etc
 *) manually via a function

 but not at cast time.  This preserves compatibility for the important
 points and allows serialization of structures that are difficult with
 the binary mode variant.
Seems like this would not play nice with how PostgreSQL type system work
in general, but could be a way forward if you say that you really do not
need
to store the order-preserving, multi-valued json.

But in this case it could also be possible for these function to just
generate
json-format text, and with proper casts this would act exactly as you
describe
above, no ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ideas of printing out the alternative paths

2013-11-14 Thread Tom Lane
Zhan Li zhanl...@gmail.com writes:
 When searching all the possible paths of executing a query, the optimizer
 finds and saves the cheapest paths for the top level rel. I'd like to check
 out all the paths the optimizer has ever considered, which I believe, are
 stored in the pathlist of the top level rel.

No, most of them have been thrown away long before that.  See add_path.
Also realize that in a large join problem, a lot of potential plans never
get explicitly considered, because the input paths get pruned before we
get to considering the join rel at all.  (If this were not so, planning
would take too long.)

People have experimented with having add_path print something about each
path that's fed to it, but the output tends to be voluminous and not all
that useful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread Fujii Masao
On Thu, Nov 14, 2013 at 9:09 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 Hi,

 I create a patch that is improvement of disk-read and OS file caches. It can
 optimize kernel readahead parameter using buffer access strategy and
 posix_fadvice() in various disk-read situations.

When I compiled the HEAD code with this patch on MacOS, I got the following
error and warnings.

gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I../../../../src/include   -c -o fd.o fd.c
fd.c: In function 'BufferHintIOAdvise':
fd.c:1182: error: 'POSIX_FADV_SEQUENTIAL' undeclared (first use in
this function)
fd.c:1182: error: (Each undeclared identifier is reported only once
fd.c:1182: error: for each function it appears in.)
fd.c:1185: error: 'POSIX_FADV_RANDOM' undeclared (first use in this function)
make[4]: *** [fd.o] Error 1
make[3]: *** [file-recursive] Error 2
make[2]: *** [storage-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2

tablecmds.c:9120: warning: passing argument 5 of 'smgrread' makes
pointer from integer without a cast
bufmgr.c:455: warning: passing argument 5 of 'smgrread' from
incompatible pointer type

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread Fujii Masao
On Thu, Nov 14, 2013 at 7:11 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Hmm, now if we had portable atomic addition, so that we could spare the
 spinlock ...

 That certainly seems like an interesting possibility.

 I think that pg_stat_statements should be made to do this kind of
 thing by a third party tool that aggregates snapshots of deltas.
 Time-series data, including (approximate) *local* minima and maxima
 should be built from that. I think tools like KONDO-san's pg_statsinfo
 tool have an important role to play here. I would like to see it or a
 similar tool become a kind of defacto standard for consuming
 pg_stat_statements' output.

 At this point we are in general very much chasing diminishing returns
 by adding new things to the counters struct, particularly given that
 it's currently protected by a spinlock. And adding a histogram or
 min/max for something like execution time isn't an approach that can
 be made to work for every existing cost tracked by pg_stat_statements.
 So, taking all that into consideration, I'm afraid this patch gets a
 -1 from me.

Agreed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ideas of printing out the alternative paths

2013-11-14 Thread Zhan Li
Thank you for your reply Tom. Then a) what are exactly stored in the
pathlist of top level rel? Paths worth considering? b) I have been
struggling to come up with a way to print the Path struct. If I can print a
path the way like A hash join (B nested loop join C), that would be
great. You mentioned people have printed something about each path, can
you please give me a hint of what's that and how to achieve that?


On Thu, Nov 14, 2013 at 12:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Zhan Li zhanl...@gmail.com writes:
  When searching all the possible paths of executing a query, the optimizer
  finds and saves the cheapest paths for the top level rel. I'd like to
 check
  out all the paths the optimizer has ever considered, which I believe, are
  stored in the pathlist of the top level rel.

 No, most of them have been thrown away long before that.  See add_path.
 Also realize that in a large join problem, a lot of potential plans never
 get explicitly considered, because the input paths get pruned before we
 get to considering the join rel at all.  (If this were not so, planning
 would take too long.)

 People have experimented with having add_path print something about each
 path that's fed to it, but the output tends to be voluminous and not all
 that useful.

 regards, tom lane



Re: [HACKERS] GIN improvements part2: fast scan

2013-11-14 Thread Alexander Korotkov
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 28.06.2013 22:31, Alexander Korotkov wrote:

 Now, I got the point of three state consistent: we can keep only one
 consistent in opclasses that support new interface. exact true and exact
 false values will be passed in the case of current patch consistent; exact
 false and unknown will be passed in the case of current patch
 preConsistent. That's reasonable.


 I'm going to mark this as returned with feedback. For the next version,
 I'd like to see the API changed per above. Also, I'd like us to do
 something about the tidbitmap overhead, as a separate patch before this, so
 that we can assess the actual benefit of this patch. And a new test case
 that demonstrates the I/O benefits.


Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.

--
With best regards,
Alexander Korotkov.


gin-fast-scan.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 10:54 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 05:06 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 This is supported by the fact that current functions on json-source
 treat it as json-object (for example key lookup gives you the value
 of latest key and not a list of all matching key values).
 yeah. hm. that's a good point.

 Maybe there's a middle ground here: I bet the compatibility issues
 would be minimized to an acceptable level if the 'xxx_to_json'
 functions maintained their current behaviors; they would construct the
 json type in a special internal mode that would behave like the
 current type does.

 Do you have any xxx_to_json usage which can generate a field with
 multiple equal keys ?

Absolutely -- that's what I've been saying all along.  For example:

 IIRC the end consumer is jqgrid, although the structure format may be
being done to satisfy some intermediate transformations perhaps in GWT
or in the browser itself.  The point is I didn't define the structure
(I think it sucks too), it was given to me to create and I did.  The
object's dynamic keys and values are moved into json structure by
passing two parallel arrays into a userland function similar to what
Andrew is proposing with json_build functionality.

{
classDisplayName: null,
rows: [
{
PropertyName: xxx,
Row: 1,
Group: Executive Dashboard,
MetricName: Occupancy,
2012: 95.4%,
Q2: 96.5%,
Q3: 96.3%,
Q4: 94.8%,
2013: 95.1%,
Q2: 94.1%,
Q3: 96.0%,
Q4: 96.1%
},
{
PropertyName: xxx,
Row: 2,
Group: Executive Dashboard,
MetricName: Occupancy,
2012: 95.9%,
Q2: 97.3%,
Q3: 95.7%,
Q4: 95.2%,
2013: 93.9%,
Q2: 93.4%,
Q3: 95.3%,
Q4: 95.1%
}
]
}

 but not at cast time.  This preserves compatibility for the important
 points and allows serialization of structures that are difficult with
 the binary mode variant.

 Seems like this would not play nice with how PostgreSQL type system work
 in general, but could be a way forward if you say that you really do not
 need
 to store the order-preserving, multi-valued json.

Yes, exactly. I'm OK with simplifying the structure for storage
purposes because in that context postgres is the parser and gets to
decide what the precise behaviors are.  Simplifying the stored
structures during upgrade is an OK concession to make, I think.  It is
not safe to assume the structure should be simplified when
serializing.

 But in this case it could also be possible for these function to just
 generate
 json-format text, and with proper casts this would act exactly as you
 describe
 above, no ?

I think so. if I'm following you correctly.  Maybe you get the best of
both worlds and (mostly) maintaining compatibility by deferring the
decomposition into binary structure in certain contexts.  I'd even
throw in the equality operator (which, thankfully, we haven't defined
yet) as a place where decomposition could happen.  Pretty much any
scenario that isn't involved in raw assembly and output.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AWS RDS now supports PostgreSQL!

2013-11-14 Thread Rayson Ho
http://aws.typepad.com/aws/2013/11/amazon-rds-for-postgresql-now-available.html

(The Free Tier page has not been updated yet, but I believe
PostgreSQL should also be free for new AWS users:
http://aws.amazon.com/free/ )

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread David E. Wheeler
On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.

The current JSON type preserves key order and duplicates. But is it documented 
that this is a feature, or something to be guaranteed? Just because people have 
come to depend on something doesn’t mean we can’t change it. It’s one thing if 
we said this was a feature you could depend on, but AFAIK we haven’t. And 
frankly, the dupes have caused problems for some of my colleagues at work. To 
me, it’s a bug (or, at best, a mis-feature) that causes more issues than it 
prevents.

In my experience, no JSON parser guarantees key order or duplication. You can’t 
have dupes and there is no ordering in a Perl hash, Objective-C NSDictionary, 
or JavaScript object. There is of course order and there can be dupes in a JSON 
string, but not in the objects built from it. If you go in and out of a parser, 
dupes are eliminated and key order is not preserved. I expect the same from 
JSON storage.

With no guarantees of preserved ordering or duplication, and with no formal 
expectation of such by JSON parsers written for various programming languages, 
I think there is little to be lost by removing those aspects of the JSON type. 
For those (hopefully rare) situations where such expectations exist, the JSON 
should be stored as text, as Hannu suggests.

My $0.02.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-11-14 Thread Fujii Masao
On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 Please find attached the patch, for adding a new option for pg_basebackup,
 to specify a different directory for pg_xlog.

Sounds good! Here are the review comments:

+printf(_(--xlogdir=XLOGDIR   location for the
transaction log directory\n));

This message is not aligned well.

-if (!streamwal || strcmp(filename +
strlen(filename) - 8, /pg_xlog) != 0)
+if ((!streamwal  (strcmp(xlog_dir, ) == 0))
+|| strcmp(filename + strlen(filename) -
8, /pg_xlog) != 0)

You need to update the source code comment.

+#ifdef HAVE_SYMLINK
+if (symlink(xlog_dir, linkloc) != 0)
+{
+fprintf(stderr, _(%s: could not create symbolic link
\%s\: %s\n),
+progname, linkloc, strerror(errno));
+exit(1);
+}
+#else
+fprintf(stderr, _(%s: symlinks are not supported on this platform 
+  cannot use xlogdir));
+exit(1);
+#endif
+}

Is it better to call pg_free() at the end? Even if we don't do that, it would be
almost harmless, though.

Don't we need to prevent users from specifying the same directory in both
--pgdata and --xlogdir?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN / NOTIFY enhancement request for Postgresql

2013-11-14 Thread Bruce Momjian
On Thu, Oct 24, 2013 at 11:41:57AM -0400, Sev Zaslavsky wrote:
 Here is an example implementation: http://activemq.apache.org/nms/
 activemq-wildcards.html
 
 
   • is used to separate names in a path
   • * is used to match any name in a path
   •  is used to recursively match any destination starting from this name
 
 For example using the example above, these subscriptions are possible
 
 Subscription  Meaning
 PRICE.  Any price for any product on any exchange
 PRICE.STOCK.Any price for a stock on any exchange
 PRICE.STOCK.NASDAQ.* Any stock price on NASDAQ
 PRICE.STOCK.*.IBMAny IBM stock price on any exchange
 
 
 My request is to implement the same or similar feature in Postgresql.

This does seem useful and pretty easy to implement.  Should we add a
TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AWS RDS now supports PostgreSQL!

2013-11-14 Thread Bruce Momjian
On Thu, Nov 14, 2013 at 12:29:58PM -0500, Rayson Ho wrote:
 http://aws.typepad.com/aws/2013/11/amazon-rds-for-postgresql-now-available.html
 
 (The Free Tier page has not been updated yet, but I believe
 PostgreSQL should also be free for new AWS users:
 http://aws.amazon.com/free/ )

Here is the Postgres AWS page:

http://aws.amazon.com/rds/postgresql/

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional information on log_line_prefix

2013-11-14 Thread Bruce Momjian
On Fri, Oct 25, 2013 at 02:58:12PM -0400, Andrew Dunstan wrote:
 
 On 10/25/2013 01:50 PM, Emanuel Calvo wrote:
 
 Hi guys,
 
 I'm working on a quick convertion script for query reviews and I
 wonder if a feature request to add the following values will be
 possible:
 
  %D = duration
  %L = lock_time   (lock only for this query)
  %E = estimated rows
  %R = total rows returned
  %B = total bytes sent
  %T = temporal tables used
 
 Those prefixes/values are just examples/proposals.
 
 Thanks for the hard work!
 
 
 
 Pretty much all of this can be got with the auto_explain module, and
 I think that's where it belongs.

The other problem is that these mostly have meaning for a line generated
by a completed query, but not for the many other lines that can appear
in that log file.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Merlin Moncure
On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.

 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?

It doesn't, but the row_to_json function has a very clear mechanism of
action.  And, 'not being documented' is not the standard for latitude
to make arbitrary changes to existing function behaviors.

 In my experience, no JSON parser guarantees key order or duplication.

I found one in about two seconds.  http://docs.python.org/2/library/json.html

object_pairs_hook, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value ofobject_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hooktakes priority.

That makes the rest of your argument moot.  Plus, I quite clearly am
dealing with parsers that do.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ideas of printing out the alternative paths

2013-11-14 Thread Tom Lane
Zhan Li zhanl...@gmail.com writes:
 Thank you for your reply Tom. Then a) what are exactly stored in the
 pathlist of top level rel? Paths worth considering? b) I have been
 struggling to come up with a way to print the Path struct. If I can print a
 path the way like A hash join (B nested loop join C), that would be
 great. You mentioned people have printed something about each path, can
 you please give me a hint of what's that and how to achieve that?

I don't think anyone's tried anything much smarter than
src/backend/nodes/outfuncs.c, or there's some more limited stuff at the
bottom of src/backend/optimizer/path/allpaths.c.  Reassembling into
something more human-readable than that would probably take some work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread Hannu Krosing
On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?
 It doesn't, but the row_to_json function has a very clear mechanism of
 action.  And, 'not being documented' is not the standard for latitude
 to make arbitrary changes to existing function behaviors.
the whole hash*() function family was changed based on not documented
premise, so we do have a precedent .

 In my experience, no JSON parser guarantees key order or duplication.
 I found one in about two seconds.  http://docs.python.org/2/library/json.html

 object_pairs_hook, if specified will be called with the result of
 every JSON object decoded with an ordered list of pairs. The return
 value ofobject_pairs_hook will be used instead of the dict. This
 feature can be used to implement custom decoders that rely on the
 order that the key and value pairs are decoded (for example,
 collections.OrderedDict() will remember the order of insertion). If
 object_hook is also defined, the object_pairs_hooktakes priority.

 That makes the rest of your argument moot.  Plus, I quite clearly am
 dealing with parsers that do.
I am sure you could also devise an json encoding scheme
where white space is significant ;)

The question is, how much of it should json *type* support.

As discussed in other thread, most of your requirements
would be met by having json/row/row set-to-text serializer
functions which output json-formatted text.

Then if you actually want to save this as easy to manipulate
json document, you can save this text to a field of type
json, which does de-duplication and loses order.

So my suggestion is to upgrade existing json data type to
text - or maybe json_text with format check - when upgrading
to 9.4, to change current function which output json  to
output text and have new json type which stores proper
JavaScript Object - like structured data.

I would like to go a step further and have it automatically support
not only the json data types as data but all postgresql data types
by including type oid in the binary encoding, but this is probably not
something for json but rather for a new pgdoc data type in 9.5

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread David Johnston
Hannu Krosing-5 wrote
 On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 lt;

 david@

 gt; wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure lt;

 mmoncure@

 gt; wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  

The current json format is a minimally conforming (i.e., does not enforce
the should not contain duplicates suggestion) structured json validating
type that stores its input as-is once validated.  Its presence is going to
probably cause difficulties with function API for reasons already mentioned
but its place in core type-library is already firmly established.  Andrew's
API additions seem like good things to have for this type.  I haven't seen
any comments on this but do these functions facilitate creating json that
can have duplicates and that maintain order?  Even if we accept input to
json with these limitations we are not obligated to make our own json output
minimally conforming - though we should at maintain such if it is already in
place.


 So my suggestion is to upgrade existing json data type to
 text - or maybe json_text with format check - when upgrading
 to 9.4, to change current function which output json  to
 output text and have new json type which stores proper
 JavaScript Object - like structured data.

Technically a down-grade but anyway...

How does this work with a pg_dump/pg_restore upgrade?


If we want to have maximally conforming json type(s) we can still create
them.  I'd say we'd still want two versions, similar in a way to how we have
bytea and text even though any text can technically be stored like
bytea.  The constructor API for both would want to be identical with the
only real difference being that text-json_source would be layout preserving
(i.e., validation only) while text-json_binary would be a true parsing
conversion.  Likewise json_source-text would output the same input while
json_binary-text would output the canonical form (pretty-printing and such
would need to be initiated via functions).

If things are going to be a little more complex anyway why not just go and
toss in the kitchen sink too?  This way we provide maximal flexibility. 
From a development perspective some features (indexes, equality, in-place
updates and related modification API) may only make sense on a subset of the
available types but trade-offs are a fact of life.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2013-11-14 Thread Heikki Linnakangas

On 14.11.2013 19:26, Alexander Korotkov wrote:

On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



On 28.06.2013 22:31, Alexander Korotkov wrote:


Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.



I'm going to mark this as returned with feedback. For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.



Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.


Thanks! A couple of thoughts after a 5-minute glance:

* documentation

* How about defining the tri-state consistent function to also return a 
tri-state? True would mean that the tuple definitely matches, false 
means the tuple definitely does not match, and Unknown means it might 
match. Or does return value true with recheck==true have the same 
effect? If I understood the patch, right, returning Unknown or True 
wouldn't actually make any difference, but it's conceivable that we 
might come up with more optimizations in the future that could take 
advantage of that. For example, for a query like foo OR (bar AND baz), 
you could immediately return any tuples that match foo, and not bother 
scanning for bar and baz at all.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN / NOTIFY enhancement request for Postgresql

2013-11-14 Thread Dimitri Fontaine
Bruce Momjian br...@momjian.us writes:
   • is used to separate names in a path
   • * is used to match any name in a path
   •  is used to recursively match any destination starting from this name
 
 For example using the example above, these subscriptions are possible
 
 Subscription  Meaning
 PRICE.  Any price for any product on any exchange
 PRICE.STOCK.Any price for a stock on any exchange
 PRICE.STOCK.NASDAQ.* Any stock price on NASDAQ
 PRICE.STOCK.*.IBMAny IBM stock price on any exchange
 
 
 My request is to implement the same or similar feature in Postgresql.

 This does seem useful and pretty easy to implement.  Should we add a
 TODO?

I think we should consider the ltree syntax in that case, as documented
in the following link:

  http://www.postgresql.org/docs/9.3/interactive/ltree.html

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks

2013-11-14 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm not volunteering to spend time fixing this, but I disagree with
 the premise that it isn't worth fixing, because I think it's a POLA
 violation.

 Yeah, I'm not terribly comfortable with letting it go either.  Attached
 is a proposed patch.  I couldn't see any nice way to do it without adding
 a field to PLpgSQL_execstate, so this isn't a feasible solution for
 back-patching (it'd break the plpgsql debugger).  However, given the
 infrequency of complaints, I think fixing it in 9.4 and up is good enough.

This patch crashed and burned when I tried it on a case where a DO block
traps an exception :-(.  I had thought that a private econtext stack was
the right thing to do, but it isn't because we need plpgsql_subxact_cb
to be able to clean up dead econtexts in either functions or DO blocks.
So after some experimentation I came up with version 2, attached.
The additional test case I was using looks like

do $outer$
begin
  for i in 1..10 loop
   begin
execute $e$
  do $$
  declare x int = 0;
  begin
x := 1 / x;
  end;
  $$;
$e$;
  exception when division_by_zero then null;
  end;
  end loop;
end;
$outer$;

If you try this with the patch, you'll notice that there's still a slow
leak, but that's not the fault of DO blocks: the same leak exists if you
transpose this code into a regular function.  That leak is the fault of
exec_stmt_dynexecute, which copies the querystring into the function's
main execution context, from which it won't get freed if an error is
thrown out of SPI_execute.  (If we were using EXECUTE USING, the ppd
structure would get leaked too.)  There are some other similar error-
case leaks in other plpgsql statements, I think.  I'm not excited
about trying to clean those up as part of this patch.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index bc31fe9..f5f1892 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*** typedef struct
*** 66,71 
--- 66,80 
   * so that we don't have to re-prepare simple expressions on each trip through
   * a function.	(We assume the case to optimize is many repetitions of a
   * function within a transaction.)
+  *
+  * However, there's no value in trying to amortize simple expression setup
+  * across multiple executions of a DO block (inline code block), since there
+  * can never be any.  If we use the shared EState for a DO block, the expr
+  * state trees are effectively leaked till end of transaction, and that can
+  * add up if the user keeps on submitting DO blocks.  Therefore, each DO block
+  * has its own simple-expression EState, which is cleaned up at exit from
+  * plpgsql_inline_handler().  DO blocks still use the simple_econtext_stack,
+  * though, so that subxact abort cleanup does the right thing.
   */
  typedef struct SimpleEcontextStackEntry
  {
*** typedef struct SimpleEcontextStackEntry
*** 74,80 
  	struct SimpleEcontextStackEntry *next;		/* next stack entry up */
  } SimpleEcontextStackEntry;
  
! static EState *simple_eval_estate = NULL;
  static SimpleEcontextStackEntry *simple_econtext_stack = NULL;
  
  /
--- 83,89 
  	struct SimpleEcontextStackEntry *next;		/* next stack entry up */
  } SimpleEcontextStackEntry;
  
! static EState *shared_simple_eval_estate = NULL;
  static SimpleEcontextStackEntry *simple_econtext_stack = NULL;
  
  /
*** static int exec_stmt_dynfors(PLpgSQL_exe
*** 136,142 
  
  static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
  	 PLpgSQL_function *func,
! 	 ReturnSetInfo *rsi);
  static void exec_eval_cleanup(PLpgSQL_execstate *estate);
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
--- 145,152 
  
  static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
  	 PLpgSQL_function *func,
! 	 ReturnSetInfo *rsi,
! 	 EState *simple_eval_estate);
  static void exec_eval_cleanup(PLpgSQL_execstate *estate);
  
  static void exec_prepare_plan(PLpgSQL_execstate *estate,
*** static char *format_preparedparamsdata(P
*** 230,239 
  /* --
   * plpgsql_exec_function	Called by the call handler for
   *function execution.
   * --
   */
  Datum
! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo)
  {
  	PLpgSQL_execstate estate;
  	ErrorContextCallback plerrcontext;
--- 240,256 
  /* --
   * plpgsql_exec_function	Called by the call handler for
   *function execution.
+  *
+  * This is also used to execute inline code blocks (DO blocks).  The only
+  * difference that this code is aware of is that for a DO block, we want
+  * to use a private simple_eval_estate, which is created and passed in by
+  * the caller.  For regular functions, pass NULL, 

Re: [HACKERS] Assertions in PL/PgSQL

2013-11-14 Thread Pavel Stehule
rebased patch

Regards

Pavel


2013/11/14 Peter Eisentraut pete...@gmx.net

 On Wed, 2013-10-09 at 18:57 +0200, Pavel Stehule wrote:
  here is a patch for RAISE WHEN clause

 Your patch needs to be rebased.


diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index ca2c2b5..d6845d7 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1753,9 +1753,7 @@ BEGIN
 
 -- Since execution is not finished, we can check whether rows were returned
 -- and raise exception if not.
-IF NOT FOUND THEN
-RAISE EXCEPTION 'No flight at %.', $1;
-END IF;
+RAISE EXCEPTION 'No flight at %.', $1 WHEN NOT FOUND;
 
 RETURN;
  END
@@ -3376,11 +3374,11 @@ END LOOP optional replaceablelabel/replaceable /optional;
 raise errors.
 
 synopsis
-RAISE optional replaceable class=parameterlevel/replaceable /optional 'replaceable class=parameterformat/replaceable' optional, replaceable class=parameterexpression/replaceable optional, ... /optional/optional optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional;
-RAISE optional replaceable class=parameterlevel/replaceable /optional replaceable class=parametercondition_name/ optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional;
-RAISE optional replaceable class=parameterlevel/replaceable /optional SQLSTATE 'replaceable class=parametersqlstate/' optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional;
-RAISE optional replaceable class=parameterlevel/replaceable /optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional;
-RAISE ;
+RAISE optional replaceable class=parameterlevel/replaceable /optional 'replaceable class=parameterformat/replaceable' optional, replaceable class=parameterexpression/replaceable optional, ... /optional/optional optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional  optional WHEN replaceableboolean-expression/replaceable /optional ;
+RAISE optional replaceable class=parameterlevel/replaceable /optional replaceable class=parametercondition_name/ optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional optional WHEN replaceableboolean-expression/replaceable /optional;
+RAISE optional replaceable class=parameterlevel/replaceable /optional SQLSTATE 'replaceable class=parametersqlstate/' optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional optional WHEN replaceableboolean-expression/replaceable /optional;
+RAISE optional replaceable class=parameterlevel/replaceable /optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional optional WHEN replaceableboolean-expression/replaceable /optional;
+RAISE optional WHEN replaceableboolean-expression/replaceable /optional ;
 /synopsis
 
 The replaceable class=parameterlevel/replaceable option specifies
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index bc31fe9..edb6105 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2874,6 +2874,20 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 	char	   *err_schema = NULL;
 	ListCell   *lc;
 
+	/* check condition when is entered */
+	if (stmt-cond != NULL)
+	{
+		bool		value;
+		bool		isnull;
+
+		value = exec_eval_boolean(estate, stmt-cond, isnull);
+		exec_eval_cleanup(estate);
+
+		/* ignore statement, when result of condition is false or NULL */
+		if (isnull || value == false)
+			return PLPGSQL_RC_OK;
+	}
+
 	/* RAISE with no parameters: re-throw current exception */
 	if (stmt-condname == NULL  stmt-message == NULL 
 		stmt-options == NIL)
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index f112282..a4d7035 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -63,6 +63,7 @@ static	void			current_token_is_not_variable(int tok);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 			int until2,
 			int until3,
+			int until4,
 			const char *expected,
 			const char *sqlstart,
 			bool isexpression,
@@ -105,7 +106,7 @@ static	void			 check_labels(const char *start_label,
 	  int end_location);
 static	PLpgSQL_expr	*read_cursor_args(PLpgSQL_var *cursor,
 		  int until, const char *expected);
-static	List			*read_raise_options(void);
+static	List			*read_raise_options(int *tok);
 
 %}
 
@@ -1386,6 +1387,7 @@ for_control		: 

Re: [HACKERS] Extra functionality to createuser

2013-11-14 Thread Christopher Browne
On Thu, Nov 14, 2013 at 5:41 AM, Sameer Thakur samthaku...@gmail.com wrote:
 So i think -g option is failing

Right you are.

I was missing a g: in the getopt_long() call.

Attached is a revised patch that handles that.

And it behaves better:
postgres@cbbrowne ~/p/s/b/scripts ./createuser -g purge_role -U
postgres newuser4
postgres@cbbrowne ~/p/s/b/scripts pg_dumpall -g | grep newuser4
CREATE ROLE newuser4;
ALTER ROLE newuser4 WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB
LOGIN NOREPLICATION;
GRANT purge_role TO newuser4 GRANTED BY postgres;

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..5fedc80 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,16 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-g//term
+  termoption--roles//term
+  listitem
+   para
+Indicates roles to associate with this role.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-i//term
   termoption--inherit//term
   listitem
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index d1542d9..c469b52 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -47,6 +47,7 @@ main(int argc, char *argv[])
{pwprompt, no_argument, NULL, 'P'},
{encrypted, no_argument, NULL, 'E'},
{unencrypted, no_argument, NULL, 'N'},
+   {roles, required_argument, NULL, 'g'},
{NULL, 0, NULL, 0}
};
 
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   char   *roles = NULL;
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, createuser, help);
 
-   while ((c = getopt_long(argc, argv, h:p:U:wWedDsSaArRiIlLc:PEN,
+   while ((c = getopt_long(argc, argv, h:p:U:g:wWedDsSaArRiIlLc:PEN,
long_options, 
optindex)) != -1)
{
switch (c)
@@ -112,6 +114,9 @@ main(int argc, char *argv[])
case 'D':
createdb = TRI_NO;
break;
+   case 'g':
+   roles = pg_strdup(optarg);
+   break;
case 's':
case 'a':
superuser = TRI_YES;
@@ -302,6 +307,8 @@ main(int argc, char *argv[])
appendPQExpBuffer(sql,  NOREPLICATION);
if (conn_limit != NULL)
appendPQExpBuffer(sql,  CONNECTION LIMIT %s, conn_limit);
+   if (roles != NULL)
+   appendPQExpBuffer(sql,  IN ROLE %s, roles);
appendPQExpBuffer(sql, ;\n);
 
if (echo)
@@ -334,6 +341,7 @@ help(const char *progname)
printf(_(  -D, --no-createdb role cannot create databases 
(default)\n));
printf(_(  -e, --echoshow the commands being sent to 
the server\n));
printf(_(  -E, --encrypted   encrypt stored password\n));
+   printf(_(  -g, --roles   roles to associate with this new 
role\n));
printf(_(  -i, --inherit role inherits privileges of roles 
it is a\n
 member of (default)\n));
printf(_(  -I, --no-inherit  role does not inherit 
privileges\n));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] union all query consumes all memory

2013-11-14 Thread Pavel Stehule
Hello,

one my customer reported a out of memory issue. After investigation he
found a main problem in large query that uses a lot of union all queries.
He wrote a self test:

do $$

declare i integer;  str text='';

begin

for i in 1..1000 loop

str := str || 'union all select i,i,i from generate_series(1,5) g(i) ';

end loop;

execute 'select 1,2,3 ' || str;

end;
$$

is it expected behave?

Tested on PostgreSQL 9.1, 9.2, 9.3

It looks so all generated data are saved in memory only.

Regards

Pavel Stehule


Re: [HACKERS] union all query consumes all memory

2013-11-14 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 one my customer reported a out of memory issue. After investigation he
 found a main problem in large query that uses a lot of union all queries.
 He wrote a self test:

 do $$

 declare i integer;  str text='';

 begin

 for i in 1..1000 loop

 str := str || 'union all select i,i,i from generate_series(1,5) g(i) ';

 end loop;

 execute 'select 1,2,3 ' || str;

 end;
 $$

 is it expected behave?

Well, that query is entitled to use 1000 * work_mem for the
generate_series output tuplestores.  What's your work_mem setting?
Perhaps more to the point, what was your customer doing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] strncpy is not a safe version of strcpy

2013-11-14 Thread David Rowley
Hi All,

As a bit of a background task, over the past few days I've been analysing
the uses of strncpy in the code just to try and validate if it is the right
function to be using. I've already seen quite a few places where their
usage is wrongly assumed.

As many of you will know and maybe some of you have forgotten that strncpy
is not a safe version of strcpy. It is also quite an inefficient way to
copy a string to another buffer as strncpy will 0 out any space that
happens to remain in the buffer. If there is no space left after the copy
then the buffer won't end with a 0.

It is likely far better explained here --
http://www.courtesan.com/todd/papers/strlcpy.html

For example , the following 2 lines in jsonfuncs.c

memset(name, 0, NAMEDATALEN);
strncpy(name, fname, NAMEDATALEN);

The memset here is redundant as strncpy will null the remaining buffer.
This example is not dangerous, but it does highlight that there's code
that's made the final cut which made this wrong assumption about strncpy.

I was not going to bring this to light until I had done some more analysis,
but there was just a commit which added a usage of strncpy that really
looks like it should be a strlcpy.

I'll continue with my analysis, but perhaps posting this early will bring
something to light which I've not yet realised.

Regards

David Rowley


[HACKERS] Anybody using get_eclass_for_sort_expr in an extension?

2013-11-14 Thread Tom Lane
I looked into bug #8591:
http://www.postgresql.org/message-id/e1vgk41-00050x...@wrigleys.postgresql.org
and was able to reproduce the problem.  The proximate cause is that
get_eclass_for_sort_expr is wrong to suppose that it can always create new
equivalence class entries with empty em_nullable_relids; in the case
where the call is coming from initialize_mergeclause_eclasses, it's
essential to use the info supplied in restrictinfo-nullable_relids.
Otherwise, quals deduced from the equivalence class may have wrong
nullable_relids settings, allowing them to be pushed to unsafe places
in 9.2 and later.

This is relatively easy to fix as in the attached patch, but I'm wondering
how risky this is to back-patch.  We can deal with the field added to
PlannerInfo by sticking it at the end of the struct in the back branches;
we've done that before and not heard squawks.  However, the signature
change for get_eclass_for_sort_expr() would be problematic if any
third-party code is calling that function directly.  I'm inclined to think
there probably isn't any such code, since none of the existing calls are
in index-specific code or other places that people would have been likely
to copy and modify.  Still, I can't claim that the risk is zero.

We could hack around that in the back branches in more or less ugly ways,
such as by making get_eclass_for_sort_expr() be a wrapper around a new
function.  However, it's far from clear what such a wrapper ought to do
--- it could pass NULL for nullable_relids, which would match the existing
behavior, but the odds seem good that the existing behavior might be wrong
for whatever an extension is trying to do.  And having the code text
diverge between HEAD and back branches isn't so appetizing anyway.

I'm a bit inclined to take the risk of breaking anything that's calling
get_eclass_for_sort_expr() directly.  Thoughts?

regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 817b149..b39927e 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outPlannerInfo(StringInfo str, const Pl
*** 1698,1703 
--- 1698,1704 
  	WRITE_UINT_FIELD(query_level);
  	WRITE_NODE_FIELD(plan_params);
  	WRITE_BITMAPSET_FIELD(all_baserels);
+ 	WRITE_BITMAPSET_FIELD(nullable_baserels);
  	WRITE_NODE_FIELD(join_rel_list);
  	WRITE_INT_FIELD(join_cur_level);
  	WRITE_NODE_FIELD(init_plans);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 711b161..baddd34 100644
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*** add_eq_member(EquivalenceClass *ec, Expr
*** 510,515 
--- 510,522 
   *	  equivalence class it is a member of; if none, optionally build a new
   *	  single-member EquivalenceClass for it.
   *
+  * expr is the expression, and nullable_relids is the set of base relids
+  * that are potentially nullable below it.	We actually only care about
+  * the set of such relids that are used in the expression; but for caller
+  * convenience, we perform that intersection step here.  The caller need
+  * only be sure that nullable_relids doesn't omit any nullable rels that
+  * might appear in the expr.
+  *
   * sortref is the SortGroupRef of the originating SortGroupClause, if any,
   * or zero if not.	(It should never be zero if the expression is volatile!)
   *
*** add_eq_member(EquivalenceClass *ec, Expr
*** 538,543 
--- 545,551 
  EquivalenceClass *
  get_eclass_for_sort_expr(PlannerInfo *root,
  		 Expr *expr,
+ 		 Relids nullable_relids,
  		 List *opfamilies,
  		 Oid opcintype,
  		 Oid collation,
*** get_eclass_for_sort_expr(PlannerInfo *ro
*** 545,550 
--- 553,559 
  		 Relids rel,
  		 bool create_it)
  {
+ 	Relids		expr_relids;
  	EquivalenceClass *newec;
  	EquivalenceMember *newem;
  	ListCell   *lc1;
*** get_eclass_for_sort_expr(PlannerInfo *ro
*** 556,561 
--- 565,576 
  	expr = canonicalize_ec_expression(expr, opcintype, collation);
  
  	/*
+ 	 * Get the precise set of nullable relids appearing in the expression.
+ 	 */
+ 	expr_relids = pull_varnos((Node *) expr);
+ 	nullable_relids = bms_intersect(nullable_relids, expr_relids);
+ 
+ 	/*
  	 * Scan through the existing EquivalenceClasses for a match
  	 */
  	foreach(lc1, root-eq_classes)
*** get_eclass_for_sort_expr(PlannerInfo *ro
*** 629,636 
  	if (newec-ec_has_volatile  sortref == 0) /* should not happen */
  		elog(ERROR, volatile EquivalenceClass has no sortref);
  
! 	newem = add_eq_member(newec, copyObject(expr), pull_varnos((Node *) expr),
! 		  NULL, false, opcintype);
  
  	/*
  	 * add_eq_member doesn't check for volatile functions, set-returning
--- 644,651 
  	if (newec-ec_has_volatile  sortref == 0) /* should not happen */
  		elog(ERROR, volatile EquivalenceClass has no sortref);
  
! 	

[HACKERS] SSL: better default ciphersuite

2013-11-14 Thread Marko Kreen
Attached patch changes the default ciphersuite to

HIGH:!aNULL

instead of old

DEFAULT:!LOW:!EXP:!MD5:@STRENGTH

where DEFAULT is a shortcut for ALL:!aNULL:!eNULL.


Main goal is to leave low-level ciphersuite details to OpenSSL guys
and give clear impression to Postgres admins what it is about.

Compared to old value, new value will remove all suites with RC4 and SEED
from ciphersuite list.  If OpenSSL is compiled with support for SSL2,
it will include following suite: DES-CBC3-MD5, usable only for SSL2
connections.

Tested with OpenSSL 0.9.7 - 1.0.1, using openssl ciphers -v ... command.


Values used
---

HIGH:
  Contains only secure and well-researched algorithms.

!aNULL
  Needed to disable suites that do not authenticate server.
  DEFAULT includes !aNULL by default.


Values not used
---

!MD5
  This affects only one suite: DES-CBC3-MD5, which is available only
  for SSL2 connections.  So it would only pollute the default value.

@STRENGTH
  The OpenSSL cipher list is already sorted by humans,
  it's unlikely that mechanical sort would improve things.
  Also the existence of this value in old list is rather
  dubious, as server cipher order was never respected anyway.

-- 
marko

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ffc69c7..d4e6c52 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3144,7 +3144,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		SSLCipherSuites,
 #ifdef USE_SSL
-		DEFAULT:!LOW:!EXP:!MD5:@STRENGTH,
+		HIGH:!aNULL,
 #else
 		none,
 #endif
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 34a2d05..e6b7f9a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -79,7 +79,7 @@
 
 #authentication_timeout = 1min		# 1s-600s
 #ssl = off# (change requires restart)
-#ssl_ciphers = 'DEFAULT:!LOW:!EXP:!MD5:@STRENGTH'	# allowed SSL ciphers
+#ssl_ciphers = 'HIGH:!aNULL'		# allowed SSL ciphers
 	# (change requires restart)
 #ssl_renegotiation_limit = 512MB	# amount of data between renegotiations
 #ssl_cert_file = 'server.crt'		# (change requires restart)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Anybody using get_eclass_for_sort_expr in an extension?

2013-11-14 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm a bit inclined to take the risk of breaking anything that's calling
 get_eclass_for_sort_expr() directly.  Thoughts?

It's worth being aware of the fact that Peter E's Jenkins instance
seems to track regressions for some popular third party extensions:

http://pgci.eisentraut.org/jenkins/view/Extensions/

Looking at what he is currently testing, these extensions seem
unlikely to fall afoul of your changes. Just something to be aware of
going forward.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread David Rowley
On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 14.11.2013 14:38, David Rowley wrote:

 I've just completed some more benchmarking of this. I didn't try dropping
 the threshold down to 2 or 0 but I did tests at the cut over point and
 really don't see much difference in performance between the list at 32 and
 the hashtable at 33 sequences. The hash table version excels in the 16000
 sequence test in comparison to the unpatched version.

 Times are in milliseconds of the time it took to call currval() 10
 times for 1 sequence.
   Patched Unpatched increased by  1 in cache 1856.452 1844.11 -1%  32
 in
 cache 1841.84 1802.433 -2%  33 in cache 1861.558  not tested N/A  16000 in
 cache 1963.711 10329.22 426%


 If I understand those results correctly, the best case scenario with the
 current code takes about 1800 ms. There's practically no difference with N
 = 32, where N is the number of sequences touched. The hash table method
 also takes about 1800 ms when N=33. The performance of the hash table is
 O(1), so presumably we can extrapolate from that that it's the same for any
 N.

 I think that means that we should just completely replace the list with
 the hash table. The difference with a small N is lost in noise, so there's
 no point in keeping the list as a fast path for small N. That'll make the
 patch somewhat simpler.
 - Heikki


I had thought that maybe the biggest type of workloads might only touch 1
or 2 sequences, though it may be small but I had thought there would be an
overhead in both cycles and memory usage in creating a hash table for these
light usages of sequence backends. It would certainly make the patch more
simple by removing this and it would also mean that I could remove the
sometimes unused -next member from the SeqTableData struct which is just
now set to NULL when in hash table mode. If you think it's the way to go
then I can make the change, though maybe I'll hold off the refactor for now
as it looks like other ideas have come up around rel cache.

Regards

David Rowley


Re: [HACKERS] strncpy is not a safe version of strcpy

2013-11-14 Thread Tomas Vondra
On 15 Listopad 2013, 0:07, David Rowley wrote:
 Hi All,

 As a bit of a background task, over the past few days I've been analysing
 the uses of strncpy in the code just to try and validate if it is the
 right
 function to be using. I've already seen quite a few places where their
 usage is wrongly assumed.

 As many of you will know and maybe some of you have forgotten that strncpy
 is not a safe version of strcpy. It is also quite an inefficient way to
 copy a string to another buffer as strncpy will 0 out any space that
 happens to remain in the buffer. If there is no space left after the copy
 then the buffer won't end with a 0.

 It is likely far better explained here --
 http://www.courtesan.com/todd/papers/strlcpy.html

 For example , the following 2 lines in jsonfuncs.c

 memset(name, 0, NAMEDATALEN);
 strncpy(name, fname, NAMEDATALEN);

Be careful with 'Name' data type - it's not just a simple string buffer.
AFAIK it needs to work with hashing etc. so the zeroing is actually needed
here to make sure two values produce the same result. At least that's how
I understand the code after a quick check - for example this is from the
same jsonfuncs.c you mentioned:

memset(fname, 0, NAMEDATALEN);
strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN);
hashentry = hash_search(json_hash, fname, HASH_FIND, NULL);

So the zeroing is on purpose, although if strncpy does that then the
memset is probably superflous. Either people do that because of habit /
copy'n'paste, or maybe there are supported platforms when strncpy does not
behave like this for some reason.

I seriously doubt this inefficiency is going to be measurable in real
world. If the result was a buffer-overflow bug, that'd be a different
story, but maybe we could check the ~120 calls to strncpy in the whole
code base and replace it with strlcpy where appropriate.

That being said, thanks for looking into things like this.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncpy is not a safe version of strcpy

2013-11-14 Thread David Rowley
On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra t...@fuzzy.cz wrote:

  It is likely far better explained here --
  http://www.courtesan.com/todd/papers/strlcpy.html
 
  For example , the following 2 lines in jsonfuncs.c
 
  memset(name, 0, NAMEDATALEN);
  strncpy(name, fname, NAMEDATALEN);

 Be careful with 'Name' data type - it's not just a simple string buffer.
 AFAIK it needs to work with hashing etc. so the zeroing is actually needed
 here to make sure two values produce the same result. At least that's how
 I understand the code after a quick check - for example this is from the
 same jsonfuncs.c you mentioned:

 memset(fname, 0, NAMEDATALEN);
 strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN);
 hashentry = hash_search(json_hash, fname, HASH_FIND, NULL);

 So the zeroing is on purpose, although if strncpy does that then the
 memset is probably superflous. Either people do that because of habit /
 copy'n'paste, or maybe there are supported platforms when strncpy does not
 behave like this for some reason.


I had not thought of the fact the some platforms don't properly implement
strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate
that this behaviour is part of the C89 standard. So does this mean we can
always assume that all supported platforms always 0 out the remaining
buffer?



 I seriously doubt this inefficiency is going to be measurable in real
 world. If the result was a buffer-overflow bug, that'd be a different
 story, but maybe we could check the ~120 calls to strncpy in the whole
 code base and replace it with strlcpy where appropriate.


The example was more of a demonstration of wrong assumption rather than
wasted cycles. Though the wasted cycles was on my mind a bit too. I was
more focused on trying to draw a bit of attention to commit
061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not
properly set the last byte to 0 afterwards. I think this case could just be
replaced with strlcpy which does all this hard work for us.

Regards

David Rowley



 That being said, thanks for looking into things like this.

 Tomas




[HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype

2013-11-14 Thread Adrian Klaver
Initial review of the patch submitted in this message and listed in the 
current CommitFest:


http://www.postgresql.org/message-id/cagpqqf3xwwc_4fhinz_g6ecvps_ov3k2pe4-aj1dg4iyy+f...@mail.gmail.com

This patch would seem to be already committed here

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7778ddc7a2d5b006edbfa69cdb44b8d8c24ec1ff

Is a review necessary at this point?

--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype

2013-11-14 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 4:53 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 Is a review necessary at this point?

No. Just mark it committed.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread David Rowley
On Fri, Nov 15, 2013 at 3:12 AM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 On 2013-11-13 22:55:43 +1300, David Rowley wrote:
  Here 
  http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.usthere
  was some talk about init_sequence being a bottleneck when many sequences
  are used in a single backend.
 
  The attached I think implements what was talked about in the above link
  which for me seems to double the speed of a currval() loop over 3
  sequences. It goes from about 7 seconds to 3.5 on my laptop.

 I think it'd be a better idea to integrate the sequence caching logic
 into the relcache. There's a comment about it:
  * (We can't
  * rely on the relcache, since it's only, well, a cache, and may decide to
  * discard entries.)
 but that's not really accurate anymore. We have the infrastructure for
 keeping values across resets and we don't discard entries.


I just want to check this idea against an existing todo item to move
sequences into a single table, as I think by the sounds of it this binds
sequences being relations even closer together.

The todo item reads:

Consider placing all sequences in a single table, or create a system view

This had been on the back of my mind while implementing the hash table
stuff for init_sequence and again when doing my benchmarks where I created
3 sequences and went through the pain of having a path on my file
system with 3 8k files.
It sounds like your idea overlaps with this todo a little, so maybe this is
a good idea to decide which would be best, though the more I think about
it, the more I think that moving sequences into a single table is a no-go

So for implementing moving sequences into a single system table:

1. The search_path stuff makes this a bit more complex. It sounds like this
would require some duplication of the search_path logic.

2. There is also the problem with tracking object dependency.

Currently:
create sequence t_a_seq;
create table t (a int not null default nextval('t_a_seq'));
alter sequence t_a_seq owned by t.a;
drop table t;
drop sequence t_a_seq; -- already deleted by drop table t
ERROR:  sequence t_a_seq does not exist

Moving sequences to a single table sounds like a special case for this
logic.


3. Would moving sequences to a table still have to check that no duplicate
object existed in the pg_class?
Currently you can't have a sequence with the same name as a table

create sequence a;
create table a (a int);
ERROR:  relation a already exists


Its not that I'm trying to shoot holes in moving sequences to a single
table, really I'd like find a way to improve the wastefulness these 1 file
per sequence laying around my file system, but if changing this is a no-go
then it would be better to come off the todo list and then we shouldn't as
many issues pouring more concrete in the sequences being relations mould.

Regards

David Rowley


Re: [HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype

2013-11-14 Thread Bruce Momjian
On Thu, Nov 14, 2013 at 04:58:55PM -0800, Peter Geoghegan wrote:
 On Thu, Nov 14, 2013 at 4:53 PM, Adrian Klaver adrian.kla...@gmail.com 
 wrote:
  Is a review necessary at this point?
 
 No. Just mark it committed.

Oh, sorry, I didn't realize it was in the commit-fest app.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread KONDO Mitsumasa

Hi Claudio,

(2013/11/14 22:53), Claudio Freire wrote:

On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:

I create a patch that is improvement of disk-read and OS file caches. It can
optimize kernel readahead parameter using buffer access strategy and
posix_fadvice() in various disk-read situations.

In general OS, readahead parameter was dynamically decided by disk-read
situations. If long time disk-read was happened, readahead parameter becomes 
big.
However it is based on experienced or heuristic algorithm, it causes waste
disk-read and throws out useful OS file caches in some case. It is bad for
disk-read performance a lot.


It would be relevant to know which kernel did you use for those tests.

I use CentOS 6.4 which kernel version is 2.6.32-358.23.2.el6.x86_64 in this 
test.



A while back, I tried to use posix_fadvise to prefetch index pages.
I search your past work. Do you talk about this ML-thread? Or is there another 
latest discussion? I see your patch is interesting, but it wasn't submitted to CF 
and stopping discussions.

http://www.postgresql.org/message-id/CAGTBQpZzf70n0PYJ=VQLd+jb3wJGo=2txmy+skjd6g_vjc5...@mail.gmail.com


I ended up finding out that interleaving posix_fadvise with I/O like
that severly hinders (ie: completely disables) the kernel's read-ahead
algorithm.
Your patch becomes maximum readahead, when a sql is selected index range scan. Is 
it right? I think that your patch assumes that pages are ordered by index-data. 
This assumption is partially wrong. If your assumption is true, we don't need 
CLUSTER command. In actuary, CLUSTER command becomes better performance than nothing.



How exactly did you set up those benchmarks? pg_bench defaults?

My detail test setting is under following,
* Server info
  CPU: Intel(R) Xeon(R) CPU E5645  @ 2.40GHz (2U/12C)
  RAM: 6GB
- I reduced it intentionally in OS paraemter, because large memory tests
   have long time.
  HDD: SEAGATE  Model: ST2000NM0001 @ 7200rpm * 1
  RAID: none.

* postgresql.conf(summarized)
  shared_buffers = 600MB (10% of RAM = 6GB)
  work_mem = 1MB
  maintenance_work_mem = 64MB
  wal_level = archive
  fsync = on
  archive_mode = on
  checkpoint_segments = 300
  checkpoint_timeout = 15min
  checkpoint_completion_target = 0.7

* pgbench settings
pgbench -j 4 -c 32 -T 600 pgbench



pg_bench does not exercise heavy sequential access patterns, or long
index scans. It performs many single-page index lookups per
transaction and that's it.
Yes, your argument is right. And it is also a fact that performance becomes 
better in these situations.



You may want to try your patch with more
real workloads, and maybe you'll confirm what I found out last time I
messed with posix_fadvise. If my experience is still relevant, those
patterns will have suffered a severe performance penalty with this
patch, because it will disable kernel read-ahead on sequential index
access. It may still work for sequential heap scans, because the
access strategy will tell the kernel to do read-ahead, but many other
access methods will suffer.
The decisive difference with your patch is that my patch uses buffer hint control 
architecture, so it can control readahaed smarter in some cases.
However, my patch is on the way and needed to more improvement. I am going to add 
method of controlling readahead by GUC, for user can freely select readahed 
parameter in their transactions.



Try OLAP-style queries.
I have DBT-3(TPC-H) benchmark tools. If you don't like TPC-H, could you tell me 
good OLAP benchmark tools?


Regards,
--
Mitsumasa KONDO
NTT Open Source Software



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread KONDO Mitsumasa

(2013/11/15 2:03), Fujii Masao wrote:

On Thu, Nov 14, 2013 at 9:09 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:

Hi,

I create a patch that is improvement of disk-read and OS file caches. It can
optimize kernel readahead parameter using buffer access strategy and
posix_fadvice() in various disk-read situations.


When I compiled the HEAD code with this patch on MacOS, I got the following
error and warnings.

gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I../../../../src/include   -c -o fd.o fd.c
fd.c: In function 'BufferHintIOAdvise':
fd.c:1182: error: 'POSIX_FADV_SEQUENTIAL' undeclared (first use in
this function)
fd.c:1182: error: (Each undeclared identifier is reported only once
fd.c:1182: error: for each function it appears in.)
fd.c:1185: error: 'POSIX_FADV_RANDOM' undeclared (first use in this function)
make[4]: *** [fd.o] Error 1
make[3]: *** [file-recursive] Error 2
make[2]: *** [storage-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2

tablecmds.c:9120: warning: passing argument 5 of 'smgrread' makes
pointer from integer without a cast
bufmgr.c:455: warning: passing argument 5 of 'smgrread' from
incompatible pointer type

Thanks you for your report!
I will fix it. Could you tell me your Mac OS version and gcc version? I have only 
mac book air with Maverick OS(10.9).


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 6:18 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 I will fix it. Could you tell me your Mac OS version and gcc version? I have
 only mac book air with Maverick OS(10.9).

I have an idea that Mac OSX doesn't have posix_fadvise at all. Didn't
you use the relevant macros so that the code at least builds on those
platforms?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread KONDO Mitsumasa

(2013/11/14 7:11), Peter Geoghegan wrote:

On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Hmm, now if we had portable atomic addition, so that we could spare the
spinlock ...

And adding a histogram or
min/max for something like execution time isn't an approach that can
be made to work for every existing cost tracked by pg_stat_statements.
So, taking all that into consideration, I'm afraid this patch gets a
-1 from me.
It is confirmation just to make sure, does this patch mean my patch? I agree 
with you about not adding another lock implementation. It will becomes overhead.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote:
 I think that pg_stat_statements should be made to do this kind of
 thing by a third party tool that aggregates snapshots of deltas.
 Time-series data, including (approximate) *local* minima and maxima
 should be built from that. I think tools like KONDO-san's pg_statsinfo
 tool have an important role to play here. I would like to see it or a
 similar tool become a kind of defacto standard for consuming
 pg_stat_statements' output.

 Agreed.

I would like to hear others' thoughts on how to proceed here. Has
anyone actually tried and failed with an approach that uses aggressive
aggregation of snapshots/deltas? If that's something that is
problematic, let's hear why. Maybe pg_stat_statements could do better
when snapshots are aggressively taken by tools at fixed intervals.
Most obviously, we could probably come up with a way better interface
for tools that need deltas only, where a consumer registers interest
in receiving snapshots. We could store a little bitmap structure in
shared memory, and set bits as corresponding pg_stat_statements
entries are dirtied, and only send query texts the first time.

I am still very much of the opinion that we need to expose queryid and
so on. I lost that argument several times, but it seems like there is
strong demand from it from many places - I've had people that I don't
know talk to me about it at conferences.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread KONDO Mitsumasa

(2013/11/15 11:17), Peter Geoghegan wrote:

On Thu, Nov 14, 2013 at 6:18 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:

I will fix it. Could you tell me your Mac OS version and gcc version? I have
only mac book air with Maverick OS(10.9).


I have an idea that Mac OSX doesn't have posix_fadvise at all. Didn't
you use the relevant macros so that the code at least builds on those
platforms?

Thank you for your nice advice, too.
I try to fix macro program.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 6:28 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 It is confirmation just to make sure, does this patch mean my patch? I
 agree with you about not adding another lock implementation. It will becomes
 overhead.

Yes, I referred to your patch. I don't want to go down this road,
because aggregation and constructing a timeline feels like the job of
another tool. I am concerned about local minima and maxima. Even with
the ability to reset min/max independently, you can't do so for each
entry individually. And this approach won't scale to a histogram or
more sophisticated ways of characterizing distribution, particularly
not multiplicatively for things other than execution time (blocks hit
and so on) - that spinlock needs to be held for very little time
indeed to preserve pg_stat_statements current low overhead.

As I said above, lets figure out how to have your tool or a similar
tool acquire snapshots inexpensively and frequently instead. That is a
discussion I'd be happy to have. IMO pg_stat_statements ought to be as
cheap as possible, and do one thing well - aggregate fixed-unit costs
cumulatively.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] information schema parameter_default implementation

2013-11-14 Thread Peter Eisentraut
Updated patch attached.

On Sat, 2013-11-09 at 12:09 +0530, Amit Khandekar wrote:
   2) I found the following check a bit confusing, maybe you can make
 it
   better
   if (!argmodes || argmodes[i] == PROARGMODE_IN || argmodes[i] ==
   PROARGMODE_INOUT || argmodes[i] == PROARGMODE_VARIADIC)
 
  Factored that out into a separate helper function.
 
  
 The statement needs to be split into 80 columns width lines.

done

   3) Function level comment for pg_get_function_arg_default() is
   missing.
 
  I think the purpose of the function is clear.
 
 Unless a reader goes through the definition, it is not obvious whether
 the second function argument represents the parameter position in
 input parameters or it is the parameter position in *all* the function
 parameters (i.e. proargtypes or proallargtypes). I think at least a
 one-liner description of what this function does should be present. 

done

 
   4) You can also add comments inside the function, for example the
   comment for the line:
   nth = inputargn - 1 - (proc-pronargs - proc-pronargdefaults);
 
  Suggestion?
 
 Yes, it is difficult to understand what's the logic behind this
 calculation, until we go read the documentation related to
 pg_proc.proargdefaults. I think this should be sufficient:
 /*
 * proargdefaults elements always correspond to the last N input
 arguments,
 * where N = pronargdefaults. So calculate the nth_default accordingly.
 */

done

 There should be an initial check to see if nth_arg is passed a
 negative value. It is used as an index into the argmodes array, so a
 -ve array index can cause a crash. Because
 pg_get_function_arg_default() can now be called by a user also, we
 need to validate the input values. I am ok with returning with an
 error Invalid argument.

done

 ---
 Instead of :
 deparse_expression_pretty(node, NIL, false, false, 0, 0) 
 you can use :
 deparse_expression(node, NIL, false, false)
 
done

From 442911934c1640af18a83ef2efaafc45c569c98f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut pete...@gmx.net
Date: Thu, 14 Nov 2013 21:43:15 -0500
Subject: [PATCH] Implement information_schema.parameters.parameter_default
 column

Reviewed-by: Ali Dar ali.munir@gmail.com
Reviewed-by: Amit Khandekar amit.khande...@enterprisedb.com
---
 doc/src/sgml/information_schema.sgml|  9 +++
 src/backend/catalog/information_schema.sql  |  9 ++-
 src/backend/utils/adt/ruleutils.c   | 84 +
 src/include/catalog/catversion.h|  2 +-
 src/include/catalog/pg_proc.h   |  2 +
 src/include/utils/builtins.h|  1 +
 src/test/regress/expected/create_function_3.out | 33 +-
 src/test/regress/sql/create_function_3.sql  | 24 +++
 8 files changed, 160 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index 22e17bb..22f43c8 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -3323,6 +3323,15 @@ titleliteralparameters/literal Columns/title
in future versions.)
   /entry
  /row
+
+ row
+  entryliteralparameter_default/literal/entry
+  entrytypecharacter_data/type/entry
+  entry
+   The default expression of the parameter, or null if none or if the
+   function is not owned by a currently enabled role.
+  /entry
+ /row
 /tbody
/tgroup
   /table
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index c5f7a8b..fd706e3 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1133,10 +1133,15 @@ CREATE VIEW parameters AS
CAST(null AS sql_identifier) AS scope_schema,
CAST(null AS sql_identifier) AS scope_name,
CAST(null AS cardinal_number) AS maximum_cardinality,
-   CAST((ss.x).n AS sql_identifier) AS dtd_identifier
+   CAST((ss.x).n AS sql_identifier) AS dtd_identifier,
+   CAST(
+ CASE WHEN pg_has_role(proowner, 'USAGE')
+  THEN pg_get_function_arg_default(p_oid, (ss.x).n)
+  ELSE NULL END
+ AS character_data) AS parameter_default
 
 FROM pg_type t, pg_namespace nt,
- (SELECT n.nspname AS n_nspname, p.proname, p.oid AS p_oid,
+ (SELECT n.nspname AS n_nspname, p.proname, p.oid AS p_oid, p.proowner,
  p.proargnames, p.proargmodes,
  _pg_expandarray(coalesce(p.proallargtypes, p.proargtypes::oid[])) AS x
   FROM pg_namespace n, pg_proc p
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ffce68..dace05a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2266,6 +2266,90 @@ print_function_arguments(StringInfo buf, HeapTuple proctup,
 	return argsprinted;
 }
 
+static bool
+is_input_argument(int nth, const char 

[HACKERS] Database disconnection and switch inside a single bgworker

2013-11-14 Thread Michael Paquier
Hi all,

Currently, bgworkers offer the possibility to connect to a given
database using BackgroundWorkerInitializeConnection in bgworker.h, but
there is actually no way to disconnect from a given database inside
the same bgworker process.

One of the use cases for that would be the possibility to have the
same bgworker performing for example some analysis on a database A,
like some analysis of statistics using a common database like
postgres, and then perform some actions on another database like,
let's imagine an ANALYSE on a given relation only on database B.

Using the infrastructure of 9.4 as of now, it would be possible of
course to have a bgworker process launching some other child processes
dynamically on different databases, but users (including me) might not
want to do that all the time.

Database disconnection would be also pretty cool for things like
parallel query processing using a pool of bgworker processes that all
the backends could use in parallel as a single ressource.

Note that I didn't have a look at the code yet to see how it would be
possible to do that (looks tricky though), if any infrastructure is
needed or if it could be possible to do that without modifying the
core code of Postgres. So, opinions about that as well as additional
thoughts are welcome!

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging WAL when updating hintbit

2013-11-14 Thread Sawada Masahiko
On Thu, Nov 14, 2013 at 3:53 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Nov 14, 2013 at 3:02 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached patch adds new wal_level 'all'.
 If wal_level is set 'all', the server logs WAL not only when wal_level
 is set 'hot_standby' ,but also when updating hint bit.
 That is, we will be able to know all of the changed block number by
 reading the WALs.
 Is 'all' a name really suited? It looks too general. I don't have a
 name on top of my mind but what about something like full_pages or
 something similar...

Yes, I'm worried about name of value.
But 'full_pages' sounds good for me.


 This wal_level is infrastructure for fast failback. (i.g., without fresh 
 backup)
 It need to cooperate with pg_rewind.
 I am not sure that using as reason the possible interactions of a
 contrib module not in core is a reason sufficient to justify the
 presence of a new wal_level, and pg_rewind is still a young solution
 that needs to be improved. So such a patch looks premature IMO, but
 the idea is interesting and might cover many needs for external
 projects.

 Not only that, I think it will be profitable infrastructure for
 differential backup.
 Yep, agreed. This might help some projects in this area.

 And it leads to improve performance at standby server side. Because
 the standby server doesn't update hintbit by itself, but FPW is
 replicated to standby server and applied.
 It would be interesting to see some numbers here.

I think this patch provide several benefit for feature. The fast
failback with pg_rewind is one of them.
If I want to provide only the fast failback with pg_rewind, this patch
logs too much information.
Just logging changed block number is enough for that.

As you said pg_rewind is still a young solution. But It very cool and
flexible solution.
I'm going to improve pg_rewind actively.


 This is clearly a WIP patch so it does not matter now, but if you
 submit it later on, be sure to add some comments in bufmgr.c as well
 as documentation for the new option.

Yes, will do.


-- 
Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Autoconf 2.69 update

2013-11-14 Thread Peter Eisentraut
I'm proposing that we upgrade our Autoconf to 2.69, which is the latest
right now (release date 2012-04-24).  There are no changes in the source
needed, just tweak the version number in configure.in (see below) and
run autoreconf.  I've compared the configure output before and after on
a few boxes, and there were no significant changes.

The major benefit of this update is that configure shrinks to about half
the size and also runs a bit faster.

Maybe at this time, everyone who is interested can check that they have
an installation of Autoconf 2.69 around, and then in a few weeks we can
do the update.


diff --git a/configure.in b/configure.in
index a4baeaf..ac61be5 100644
--- a/configure.in
+++ b/configure.in
@@ -19,7 +19,7 @@ m4_pattern_forbid(^PGAC_)dnl to catch undefined macros
 
 AC_INIT([PostgreSQL], [9.4devel], [pgsql-b...@postgresql.org])
 
-m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.63], [], [m4_fatal([Autoconf version 
2.63 is required.
+m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.69], [], [m4_fatal([Autoconf version 
2.69 is required.
 Untested combinations of 'autoconf' and PostgreSQL versions are not
 recommended.  You can remove the check from 'configure.in' but it is then
 your responsibility whether the result works or not.])])




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autoconf 2.69 update

2013-11-14 Thread Michael Paquier
On Fri, Nov 15, 2013 at 12:00 PM, Peter Eisentraut pete...@gmx.net wrote:
 I'm proposing that we upgrade our Autoconf to 2.69, which is the latest
 right now (release date 2012-04-24).  There are no changes in the source
 needed, just tweak the version number in configure.in (see below) and
 run autoreconf.  I've compared the configure output before and after on
 a few boxes, and there were no significant changes.
 The major benefit of this update is that configure shrinks to about half
 the size and also runs a bit faster.
+1 for the update.

 Maybe at this time, everyone who is interested can check that they have
 an installation of Autoconf 2.69 around, and then in a few weeks we can
 do the update.
Archlinux boxes will be happier with that.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncpy is not a safe version of strcpy

2013-11-14 Thread Tomas Vondra
On 15 Listopad 2013, 1:00, David Rowley wrote:
 On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra t...@fuzzy.cz wrote:

  It is likely far better explained here --
  http://www.courtesan.com/todd/papers/strlcpy.html
 
  For example , the following 2 lines in jsonfuncs.c
 
  memset(name, 0, NAMEDATALEN);
  strncpy(name, fname, NAMEDATALEN);

 Be careful with 'Name' data type - it's not just a simple string buffer.
 AFAIK it needs to work with hashing etc. so the zeroing is actually
 needed
 here to make sure two values produce the same result. At least that's
 how
 I understand the code after a quick check - for example this is from the
 same jsonfuncs.c you mentioned:

 memset(fname, 0, NAMEDATALEN);
 strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN);
 hashentry = hash_search(json_hash, fname, HASH_FIND, NULL);

 So the zeroing is on purpose, although if strncpy does that then the
 memset is probably superflous. Either people do that because of habit /
 copy'n'paste, or maybe there are supported platforms when strncpy does
 not
 behave like this for some reason.


 I had not thought of the fact the some platforms don't properly implement
 strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate
 that this behaviour is part of the C89 standard. So does this mean we can
 always assume that all supported platforms always 0 out the remaining
 buffer?

I don't know about such platform - I was merely speculating about why
people might use such code.

 I seriously doubt this inefficiency is going to be measurable in real
 world. If the result was a buffer-overflow bug, that'd be a different
 story, but maybe we could check the ~120 calls to strncpy in the whole
 code base and replace it with strlcpy where appropriate.


 The example was more of a demonstration of wrong assumption rather than
 wasted cycles. Though the wasted cycles was on my mind a bit too. I was

Yeah. To be fair, number of occurrences in the code base is not a
particularly exact measure of the impact - some of those uses might be
used in code paths that are quite busy.

 more focused on trying to draw a bit of attention to commit
 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not
 properly set the last byte to 0 afterwards. I think this case could just
 be
 replaced with strlcpy which does all this hard work for us.

Hmm, you mean this piece of code?

   strncpy(saved_argv0, argv[0], MAXPGPATH);

IMHO you're right that's probably broken, unless there's some checking
happening before the call.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Peter Geoghegan
On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Hello,
 Please find attached pg_stat_statements-identification-v9.patch.

I took a quick look. Observations:

+   /* Making query ID dependent on PG version */
+   query-queryId |= PG_VERSION_NUM  16;

If you want to do something like this, make the value of
PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

Why are you doing this?

@@ -128,6 +146,7 @@ typedef struct pgssEntry
pgssHashKey key;/* hash key of entry - MUST BE 
FIRST */
Counterscounters;   /* the statistics for this 
query */
int query_len;  /* # of valid bytes in 
query string */
+   uint32  query_id;   /* jumble value for this entry 
*/

query_id is already in key.

Not sure I like the idea of the new enum at all, but in any case you
shouldn't have a PGSS_TUP_LATEST constant - should someone go update
all usage of that constant only when your version isn't the latest?
Like here:

+   if (detected_version = PGSS_TUP_LATEST)

I forget why Daniel originally altered the min value of
pg_stat_statements.max to 1 (I just remember that he did), but I don't
think it holds that you should keep it there. Have you considered the
failure modes when it is actually set to 1?

This is what I call a can't happen error, or a defensive one:

+   else
+   {
+   /*
+* Couldn't identify the tuple format.  Raise error.
+*
+* This is an exceptional case that may only happen in bizarre
+* situations, since it is thought that every released version
+* of pg_stat_statements has a matching schema.
+*/
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg(pg_stat_statements schema is not 
supported 
+   by its installed binary)));
+   }

I'll generally make these simple elogs(), which are more terse. No one
is going to find all that dressing useful.

Please take a look at this, for future reference:

https://wiki.postgresql.org/wiki/Creating_Clean_Patches

The whitespace changes are distracting.

It probably isn't useful to comment random, unaffected code that isn't
affected by your patch - I don't find this new refactoring useful, and
am surprised to see it in your patch:

+   /* Check header existence and magic number match. */
if (fread(header, sizeof(uint32), 1, file) != 1 ||
-   header != PGSS_FILE_HEADER ||
-   fread(num, sizeof(int32), 1, file) != 1)
+   header != PGSS_FILE_HEADER)
+   goto error;
+
+   /* Read how many table entries there are. */
+   if (fread(num, sizeof(int32), 1, file) != 1)
goto error;

Did you mean to add all this, or is it left over from Daniel's patch?

@@ -43,6 +43,7 @@
  */
 #include postgres.h

+#include time.h
 #include unistd.h

 #include access/hash.h
@@ -59,15 +60,18 @@
 #include storage/spin.h
 #include tcop/utility.h
 #include utils/builtins.h
+#include utils/timestamp.h

Final thought: I think the order in the pg_stat_statements view is
wrong. It ought to be like a composite primary key - (userid, dbid,
query_id).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread KONDO Mitsumasa

(2013/11/15 11:31), Peter Geoghegan wrote:

On Thu, Nov 14, 2013 at 6:28 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:

It is confirmation just to make sure, does this patch mean my patch? I
agree with you about not adding another lock implementation. It will becomes
overhead.


Yes, I referred to your patch. I don't want to go down this road,
because aggregation and constructing a timeline feels like the job of
another tool. I am concerned about local minima and maxima. Even with
the ability to reset min/max independently, you can't do so for each
entry individually. And this approach won't scale to a histogram or
more sophisticated ways of characterizing distribution, particularly
not multiplicatively for things other than execution time (blocks hit
and so on)
I think that pg_stat_statements is light-weight monitoring tool, not whole 
monitoring tool. This feature is very good for everyone to get statistics.
If you'd like to get more detail statistics, I suggest you to add another 
monitoring tools like pg_stat_statements_full. It might more heavy, but it can 
get more detail information. Everyone will welcome to new features of that.



- that spinlock needs to be held for very little time
indeed to preserve pg_stat_statements current low overhead.
I'd like to leave pg_stat_statement low overhead. My patch realizes it. I don't 
add new locks and complicated code in my patch.



As I said above, lets figure out how to have your tool or a similar
tool acquire snapshots inexpensively and frequently instead.

We tried to solve this problem using our tool in the past.
However, it is difficult that except log output method which is log_statement=all 
option. So we try to add new feature to pg_stat_statement, it would help DBA to 
provide useful statistics without overhead.



That is a
discussion I'd be happy to have. IMO pg_stat_statements ought to be as
cheap as possible, and do one thing well - aggregate fixed-unit costs
cumulatively.

I am also happy to your discussion!
I'd like to install your new patch and give you my comment.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-11-14 Thread KONDO Mitsumasa

(2013/11/15 2:09), Fujii Masao wrote:

Agreed.
Could you tell me your agreed reason? I am sorry that I suspect you doesn't 
understand this disccusion enough:-(


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Ceter


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-14 Thread Amit Kapila
On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 12 November 2013 08:47 Amit Kapila wrote:
 On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 08 November 2013 18:35 Amit Kapila wrote:
  On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 07 November 2013 09:42 Amit Kapila wrote:
   1. Taking a copy of n_dead_tuples before VACUUM starts and then
  subtract it once it is done.
  This approach doesn't include the tuples which are remains
   during
  the vacuum operation.

 Patch is modified as take a copy of n_dead_tuples during vacuum start and use
 the same while calculating the new dead tuples at end of vacuum.

  By the way, do you have test case or can you try to write a test
 case
  which can show this problem and then after fix, you can verify if
 the
  problem is resolved.
 
  The simulated index bloat problem can be generated using the attached
 script and sql.
  With the fix of setting the dead tuples properly,

Which fix here you are referring to, is it the one which you have
 proposed with your initial mail?

  the bloat is reduced and by changing the vacuum cost Parameters the
  bloat is avoided.

 With the simulated bloat test run for 1 hour the bloat occurred as below,

 Unpatched - 1532MB
 Patched   - 1474MB

In your test run, have you checked what happen if after heavy update
(or once bloat occurs), if you keep the system idle (or just have read
load on system) for some time, what is the result?

You haven't answered one of my questions in previous mail
( With the fix of setting the dead tuples properly, the bloat is
reduced and by changing the vacuum cost Parameters the bloat is
avoided.
Which fix here you are referring to?)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Daniel Farina
On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Hello,
 Please find attached pg_stat_statements-identification-v9.patch.

 I took a quick look. Observations:

 +   /* Making query ID dependent on PG version */
 +   query-queryId |= PG_VERSION_NUM  16;

 If you want to do something like this, make the value of
 PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

 Why are you doing this?

The destabilization of the query_id is to attempt to skirt the
objection that the unpredictability of instability in the query_id
would otherwise cause problems and present a false contract,
particular in regards to point release upgrades.

Earliest drafts of mine included destabilizing every session, but this
kills off a nice use case of correlating query ids between primaries
and standbys, if memory serves.  This has the semantics of
destabilizing -- for sure -- every point release.

 @@ -128,6 +146,7 @@ typedef struct pgssEntry
 pgssHashKey key;/* hash key of entry - MUST 
 BE FIRST */
 Counterscounters;   /* the statistics for this 
 query */
 int query_len;  /* # of valid bytes 
 in query string */
 +   uint32  query_id;   /* jumble value for this 
 entry */

 query_id is already in key.

 Not sure I like the idea of the new enum at all, but in any case you
 shouldn't have a PGSS_TUP_LATEST constant - should someone go update
 all usage of that constant only when your version isn't the latest?
 Like here:

 +   if (detected_version = PGSS_TUP_LATEST)

It is roughly modeled after the column version of the same thing
that pre-existed in pg_stat_statements.  The only reason to have a
LATEST is for some of the invariants though; otherwise, most uses
should use the version-stamped symbol.  I did this wrongly a bunch of
times as spotted by Alvaro, I believe.

 I forget why Daniel originally altered the min value of
 pg_stat_statements.max to 1 (I just remember that he did), but I don't
 think it holds that you should keep it there. Have you considered the
 failure modes when it is actually set to 1?

Testing.  It was very useful to set to small numbers, like two or
three.  It's not crucial to the patch at all but having to whack it
back all the time to keep the patch submission devoid of it seemed
impractically tedious during revisions.

 This is what I call a can't happen error, or a defensive one:

 +   else
 +   {
 +   /*
 +* Couldn't identify the tuple format.  Raise error.
 +*
 +* This is an exceptional case that may only happen in bizarre
 +* situations, since it is thought that every released version
 +* of pg_stat_statements has a matching schema.
 +*/
 +   ereport(ERROR,
 +   
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 +errmsg(pg_stat_statements schema is not 
 supported 
 +   by its installed binary)));
 +   }

 I'll generally make these simple elogs(), which are more terse. No one
 is going to find all that dressing useful.

That seems reasonable.  Yes, it's basically a soft assert.  I hit this
one in development a few times, it was handy.

 It probably isn't useful to comment random, unaffected code that isn't
 affected by your patch - I don't find this new refactoring useful, and
 am surprised to see it in your patch:

 +   /* Check header existence and magic number match. */
 if (fread(header, sizeof(uint32), 1, file) != 1 ||
 -   header != PGSS_FILE_HEADER ||
 -   fread(num, sizeof(int32), 1, file) != 1)
 +   header != PGSS_FILE_HEADER)
 +   goto error;
 +
 +   /* Read how many table entries there are. */
 +   if (fread(num, sizeof(int32), 1, file) != 1)
 goto error;

 Did you mean to add all this, or is it left over from Daniel's patch?

The whitespace changes are not intentional, but I think the comments
help a reader track what's going on better, which becomes more
important if one adds more fields.  It can be broken out into a
separate patch, but it didn't seem like that bookkeeping was
necessary.

 @@ -43,6 +43,7 @@
   */
  #include postgres.h

 +#include time.h
  #include unistd.h

  #include access/hash.h
 @@ -59,15 +60,18 @@
  #include storage/spin.h
  #include tcop/utility.h
  #include utils/builtins.h
 +#include utils/timestamp.h

 Final thought: I think the order in the pg_stat_statements view is
 wrong. It ought to be like a composite primary key - (userid, dbid,
 query_id).

I made no design decisions here...no complaints from me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-11-14 Thread Claudio Freire
On Thu, Nov 14, 2013 at 11:13 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 Hi Claudio,


 (2013/11/14 22:53), Claudio Freire wrote:

 On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa
 kondo.mitsum...@lab.ntt.co.jp wrote:

 I create a patch that is improvement of disk-read and OS file caches. It
 can
 optimize kernel readahead parameter using buffer access strategy and
 posix_fadvice() in various disk-read situations.

 In general OS, readahead parameter was dynamically decided by disk-read
 situations. If long time disk-read was happened, readahead parameter
 becomes big.
 However it is based on experienced or heuristic algorithm, it causes
 waste
 disk-read and throws out useful OS file caches in some case. It is bad
 for
 disk-read performance a lot.


 It would be relevant to know which kernel did you use for those tests.

 I use CentOS 6.4 which kernel version is 2.6.32-358.23.2.el6.x86_64 in this
 test.

That's close to the kernel version I was using, so you should see the
same effect.

 A while back, I tried to use posix_fadvise to prefetch index pages.

 I search your past work. Do you talk about this ML-thread? Or is there
 another latest discussion? I see your patch is interesting, but it wasn't
 submitted to CF and stopping discussions.
 http://www.postgresql.org/message-id/CAGTBQpZzf70n0PYJ=VQLd+jb3wJGo=2txmy+skjd6g_vjc5...@mail.gmail.com

Yes, I didn't, exactly because of that bad interaction with the
kernel. It needs either more smarts to only do fadvise on known-random
patterns (what you did mostly), or an accompanying kernel patch (which
I was working on, but ran out of test machines).

 I ended up finding out that interleaving posix_fadvise with I/O like
 that severly hinders (ie: completely disables) the kernel's read-ahead
 algorithm.

 Your patch becomes maximum readahead, when a sql is selected index range
 scan. Is it right?

Ehm... sorta.

 I think that your patch assumes that pages are ordered by
 index-data.

No. It just knows which pages will be needed, and fadvises them. No
guessing involved, except the guess that the scan will not be aborted.
There's a heuristic to stop limited scans from attempting to fadvise,
and that's that prefetch strategy is applied only from the Nth+ page
walk.

It improves index-only scans the most, but I also attempted to handle
heap prefetches. That's where the kernel started conspiring against
me, because I used many naturally-clustered indexes, and THERE
performance was adversely affected because of that kernel bug.

 You may want to try your patch with more
 real workloads, and maybe you'll confirm what I found out last time I
 messed with posix_fadvise. If my experience is still relevant, those
 patterns will have suffered a severe performance penalty with this
 patch, because it will disable kernel read-ahead on sequential index
 access. It may still work for sequential heap scans, because the
 access strategy will tell the kernel to do read-ahead, but many other
 access methods will suffer.

 The decisive difference with your patch is that my patch uses buffer hint
 control architecture, so it can control readahaed smarter in some cases.

Indeed, but it's not enough. See my above comment about naturally
clustered indexes. The planner expects that, and plans accordingly. It
will notice correlation between a PK and physical location, and will
treat an index scan over PK to be almost sequential. With your patch,
that assumption will be broken I believe.

 However, my patch is on the way and needed to more improvement. I am going
 to add method of controlling readahead by GUC, for user can freely select
 readahed parameter in their transactions.

Rather, I'd try to avoid fadvising consecutive or almost-consecutive
blocks. Detecting that is hard at the block level, but maybe you can
tie that detection into the planner, and specify a sequential strategy
when the planner expects index-heap correlation?

 Try OLAP-style queries.

 I have DBT-3(TPC-H) benchmark tools. If you don't like TPC-H, could you tell
 me good OLAP benchmark tools?

I don't really know. Skimming the specs, I'm not sure if those queries
generate large index range queries. You could try, maybe with
autoexplain?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-14 Thread Haribabu kommi
On 15 November 2013 10:00 Amit Kapila wrote:
 On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 12 November 2013 08:47 Amit Kapila wrote:
  On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 08 November 2013 18:35 Amit Kapila wrote:
   On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
On 07 November 2013 09:42 Amit Kapila wrote:
1. Taking a copy of n_dead_tuples before VACUUM starts and then
   subtract it once it is done.
   This approach doesn't include the tuples which are remains
during
   the vacuum operation.
 
  Patch is modified as take a copy of n_dead_tuples during vacuum start
  and use the same while calculating the new dead tuples at end of
 vacuum.
 
   By the way, do you have test case or can you try to write a test
  case
   which can show this problem and then after fix, you can verify if
  the
   problem is resolved.
  
   The simulated index bloat problem can be generated using the
   attached
  script and sql.
   With the fix of setting the dead tuples properly,
 
 Which fix here you are referring to, is it the one which you have
  proposed with your initial mail?
 
   the bloat is reduced and by changing the vacuum cost Parameters
 the
   bloat is avoided.
 
  With the simulated bloat test run for 1 hour the bloat occurred as
  below,
 
  Unpatched - 1532MB
  Patched   - 1474MB
 
 In your test run, have you checked what happen if after heavy update
 (or once bloat occurs), if you keep the system idle (or just have read
 load on system) for some time, what is the result?

In the simulated test run which is shared in the previous mail, after a heavy 
update
the system is idle for 15 mins.

With the master code, the vacuum is not triggered during this idle time as it is
Not satisfies the vacuum threshold, because it doesn't consider the dead tuples 
occurred
During vacuum operation.

With the fix the one extra vacuum can gets triggered compared to master code 
after two or three
heavy updates because of accumulation of dead tuples.

 You haven't answered one of my questions in previous mail ( With the
 fix of setting the dead tuples properly, the bloat is reduced and by
 changing the vacuum cost Parameters the bloat is avoided.
 Which fix here you are referring to?)

The bloat reduced is same with initial and latest patch.
The vacuum cost parameters change (which doesn't contain any fix) is avoided 
the bloat.

Regards,
Hari babu.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread David Rowley
On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote

 I think that means that we should just completely replace the list with
 the hash table. The difference with a small N is lost in noise, so there's
 no point in keeping the list as a fast path for small N. That'll make the
 patch somewhat simpler.
 - Heikki


Attached is a much more simple patch which gets rid of the initial linked
list.


hashtab_seq_v0.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL assertions prototype

2013-11-14 Thread Pavel Stehule
+1

interesting feature

Pavel


2013/11/15 Peter Eisentraut pete...@gmx.net

 Various places in the constraint checking code say something like, if we
 ever implement assertions, here is where it should go.  I've been
 fiddling with filling in those gaps for some time now, and the other day
 I noticed, hey, this actually kind of works, so here it is.  Let's see
 whether this architecture is sound.

 A constraint trigger performs the actual checking.  For the
 implementation of the trigger, I've used some SPI hacking for now; that
 could probably be refined.  The attached patch has documentation, tests,
 psql support.  Missing pieces are pg_dump support, dependency
 management, and permission checking (the latter marked in the code).

 This is not a performance feature.  It's for things like, this table
 should have at most 10 rows, or all the values in this table must be
 bigger than all the values in that other table.  It's a bit esoteric,
 but it comes up again and again.

 Let me know what you think.



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] init_sequence spill to hash table

2013-11-14 Thread David Rowley
On Fri, Nov 15, 2013 at 3:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@2ndquadrant.com writes:
  I think it'd be a better idea to integrate the sequence caching logic
  into the relcache. There's a comment about it:
   * (We can't
   * rely on the relcache, since it's only, well, a cache, and may decide
 to
   * discard entries.)
  but that's not really accurate anymore. We have the infrastructure for
  keeping values across resets and we don't discard entries.

 We most certainly *do* discard entries, if they're not open when a cache
 flush event comes along.

 I suppose it'd be possible to mark a relcache entry for a sequence
 as locked-in-core, but that doesn't attract me at all.  A relcache
 entry is a whole lot larger than the amount of state we really need
 to keep for a sequence.

 One idea is to have a hashtable for the sequence-specific data,
 but to add a link field to the relcache entry that points to the
 non-flushable sequence hashtable entry.  That would save the second
 hashtable lookup as long as the relcache entry hadn't been flushed
 since last use, while not requiring any violence to the lifespan
 semantics of relcache entries.  (Actually, if we did that, it might
 not even be worth converting the list to a hashtable?  Searches would
 become a lot less frequent.)


Unless I've misunderstood something it looks like this would mean giving
heamam.c and relcache.c knowledge of sequences.
Currently relation_open is called from open_share_lock in sequence.c. The
only way I can see to do this would be to add something like
relation_open_sequence() in heapam.c which means we'd need to invent
RelationIdGetSequenceRelation() and use that instead
of RelationIdGetRelation() and somewhere along the line have it pass back
the SeqTableData struct which would be tagged onto RelIdCacheEnt.

I think it can be done but I don't think it will look pretty.
Perhaps if there was a more generic way... Would tagging some void
*rd_extra only the RelationData be a better way? And just have sequence.c
make use of that for storing the SeqTableData.

Also I'm wondering what we'd do with all these pointers when someone does
DISCARD SEQUENCES; would we have to invalidate the relcache or would it
just be matter of looping over it and freeing of the sequence data setting
the pointers to NULL?

Regards

David Rowley


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Sameer Thakur
 I took a quick look. Observations:

 + /* Making query ID dependent on PG version */
 + query-queryId |= PG_VERSION_NUM  16;

 If you want to do something like this, make the value of
 PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

 Why are you doing this?

The thought was queryid should have a different value for the same
query across PG versions, to ensure that clients using
the view,do not assume otherwise.

 @@ -128,6 +146,7 @@ typedef struct pgssEntry
   pgssHashKey key; /* hash key of entry - MUST BE FIRST */
   Counters counters; /* the statistics for this query */
   int query_len; /* # of valid bytes in query string */
 + uint32 query_id; /* jumble value for this entry */

 query_id is already in key.

 Not sure I like the idea of the new enum at all, but in any case you
 shouldn't have a PGSS_TUP_LATEST constant - should someone go update
 all usage of that constant only when your version isn't the latest?
 Like here:

 + if (detected_version = PGSS_TUP_LATEST)

There is #define PGSS_TUP_LATEST PGSS_TUP_V1_2
So if an update has to be done, this is the one place to do it.

 I forget why Daniel originally altered the min value of
 pg_stat_statements.max to 1 (I just remember that he did), but I don't
 think it holds that you should keep it there. Have you considered the
 failure modes when it is actually set to 1?
Will set it back to the original value and also test for max value = 1

 This is what I call a can't happen error, or a defensive one:

 + else
 + {
 + /*
 + * Couldn't identify the tuple format.  Raise error.
 + *
 + * This is an exceptional case that may only happen in bizarre
 + * situations, since it is thought that every released version
 + * of pg_stat_statements has a matching schema.
 + */
 + ereport(ERROR,
 + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 + errmsg(pg_stat_statements schema is not supported 
 + by its installed binary)));
 + }

 I'll generally make these simple elogs(), which are more terse. No one
 is going to find all that dressing useful.
 Will convert to using elogs

 Please take a look at this, for future reference:

 https://wiki.postgresql.org/wiki/Creating_Clean_Patches

 The whitespace changes are distracting.

Thanks! Still learning the art of clean patch submission.

 It probably isn't useful to comment random, unaffected code that isn't
 affected by your patch - I don't find this new refactoring useful, and
 am surprised to see it in your patch:

 + /* Check header existence and magic number match. */
   if (fread(header, sizeof(uint32), 1, file) != 1 ||
 - header != PGSS_FILE_HEADER ||
 - fread(num, sizeof(int32), 1, file) != 1)
 + header != PGSS_FILE_HEADER)
 + goto error;
 +
 + /* Read how many table entries there are. */
 + if (fread(num, sizeof(int32), 1, file) != 1)
   goto error;

 Did you mean to add all this, or is it left over from Daniel's patch?
I think its a carry over from Daniel's code. I understand the thought.
Will keep patch strictly restricted to functionality implemented
 @@ -43,6 +43,7 @@
   */
  #include postgres.h

 +#include time.h
  #include unistd.h

  #include access/hash.h
 @@ -59,15 +60,18 @@
  #include storage/spin.h
  #include tcop/utility.h
  #include utils/builtins.h
 +#include utils/timestamp.h

 Final thought: I think the order in the pg_stat_statements view is
 wrong. It ought to be like a composite primary key - (userid, dbid,
 query_id).
Will make the change.
 --
 Peter Geoghegan

Thank you for the review
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5778472.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Thu, Nov 14, 2013 at 3:00 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that
 number of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting
 tree leaf pages are not vacuumed at all; that's clearly not right.


 Fixed. Now separate function handles uncompressed posting lists and
 compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing
 a page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


 Now this situation is ereported before any change in page.

 To handle splitting a non-compressed page, it seems that you're relying
 on the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


 Good point. Yes, if we don't handle specially inserting item indexes, I
 see no point to do special handling for single TID which is much smaller.
 In the attached patch dataCompressLeafPage just reserves space for single
 TID.

 Also, many changes in comments and README.

 Unfortunally, I didn't understand what is FIXME about in
 ginVacuumEntryPage. So, it's not fixed :)


 Sorry, I posted buggy version of patch. Attached version is correct.


Another bug fix by report from Rod Taylor.

--
With best regards,
Alexander Korotkov.




gin-packed-postinglists-16.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2013-11-14 Thread Alexander Korotkov
On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor r...@simple-knowledge.comwrote:

 I checked out master and put together a test case using a small percentage
 of production data for a known problem we have with Pg 9.2 and text search
 scans.

 A small percentage in this case means 10 million records randomly
 selected; has a few billion records.


 Tests ran for master successfully and I recorded timings.



 Applied the patch included here to master along with
 gin-packed-postinglists-14.patch.
 Run make clean; ./configure; make; make install.
 make check (All 141 tests passed.)

 initdb, import dump


 The GIN index fails to build with a segfault.


Thanks for testing. See fixed version in thread about packed posting lists.

--
With best regards,
Alexander Korotkov.


  1   2   >