Re: [HACKERS] plpgsql.extra_warnings='num_into_expressions'

2014-07-22 Thread Pavel Stehule
2014-07-22 8:52 GMT+02:00 Marko Tiikkaja :

> On 7/22/14, 7:06 AM, Pavel Stehule wrote:
>
>> I looked on this patch and I am thinking so it is not a good idea. It
>> introduce  early dependency between functions and pg_class based objects.
>>
>
> What dependency?  The patch only looks at the raw parser output, so it
> won't e.g. know whether  SELECT * INTO a, b FROM foo;  is problematic or
> not.
>

I am sorry, I was confused

There is dependencty in variable type, but this dependency is not new.

Regards

Pavel




>
>
> .marko
>


Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Michael Paquier
On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs  wrote:
> The block comparison facility presented earlier by Heikki would not be
> able to be used in production systems. ISTM that it would be desirable
> to have something that could be used in that way.
>
> ISTM easy to make these changes
>
> * optionally generate a FPW for every WAL record, not just first
> change after checkpoint
> full_page_writes = 'always'
>
> * when an FPW arrives, optionally run a check to see if it compares
> correctly against the page already there, when running streaming
> replication without a recovery target. We could skip reporting any
> problems until the database is consistent
> full_page_write_check = on
>
> The above changes seem easy to implement.
>
> With FPW compression, this would be a usable feature in production.
>
> Comments?

This is an interesting idea, and it would be easier to use than what
has been submitted for CF1. However, full_page_writes set to "always"
would generate a large amount of WAL even for small records,
increasing I/O for the partition holding pg_xlog, and the frequency of
checkpoints run on system. Is this really something suitable for
production?
Then, looking at the code, we would need to tweak XLogInsert for the
WAL record construction to always do a FPW and to update
XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
do some extra operations in the area of
RestoreBackupBlock/RestoreBackupBlockContents, including masking
operations before comparing the content of the FPW and the current
page.

Does that sound right?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread MauMau

From: "Andres Freund" 

On 2014-07-22 10:09:04 +0900, MauMau wrote:

Is there any problem if we don't output the message?


Yes. The user won't know that possibly gigabytes worth of diskspace
aren't freed.


RemovePgTempFiles() frees the disk space by removing temp relation files at 
server start.  In addition, the temp relation files are not replicated to 
the standby server of streaming replication (this is the customer's case). 
So, those messages seem no more than just the noise.


With this, are those messages really necessary?

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] parametric block size?

2014-07-22 Thread Fabien


Hello devs,

The default blocksize is currently 8k, which is not necessary optimal for 
all setup, especially with SSDs where the latency is much lower than HDD.


There is a case for different values with significant impact on 
performance (up to a not-to-be-sneezed-at 10% on a pgbench run on SSD, see 
http://www.cybertec.at/postgresql-block-sizes-getting-started/), and ISTM 
that the ability to align PostgreSQL block size to the underlying FS/HW 
block size would be nice.


This is currently possible, but it requires recompiling and maintaining 
distinct executables for various block sizes. This is annoying, thus most 
admins will not bother.


ISTM that a desirable and reasonably simple to implement feature would be 
to be able to set the blocksize at "initdb" time, and "postgres" could use 
the value found in the database instead of a compile-time one.


More advanced features, but with much more impact on the code, would be to 
be able to change the size at database/table level.


Any thoughts?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-22 Thread Kyotaro HORIGUCHI
Sorry , previous version has bugs. It stamps over the stack and
crashesX( The attached is the bug fixed version, with no
substantial changes.

> On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >
> > Hi, the attached is the revised version.
> 
> Thanks Horiguchi-San for the updated patch.

# By the way, this style of calling a person is quite common
# among Japanese since the first-name basis implies very close
# relationship or it frequently conveys offensive shade. But I'm
# not sure what should I call others who're not Japases native.

Anyway,

> Today while looking into updated patch, I was wondering why can't
> we eliminate useless keys in query_pathkeys when we actually build
> the same in function standard_qp_callback(), basically somewhat
> similar to what we do in
> standard_qp_callback->make_pathkeys_for_sortclauses->pathkey_is_redundant().

I agree that standard_qp_callback is basically more preferable.

> We already have index information related to base_rels before building
> query pathkeys.  I noticed that you mentioned the below in your original
> mail which indicates that information related to inheritance tables is built
> only after set_base_rel_sizes()
> 
> "These steps take place between set_base_rel_sizes() and
> set_base_rel_pathlists() in make_one_rel(). The reason for the
> position is that the step 2 above needs all inheritence tables to
> be extended in PlannerInfo and set_base_rel_sizes (currently)
> does that".

Sorry, my memory had been worn down. After some reconfirmation,
this description found to be a bit (quite?) wrong.

collect_common_primary_pathkeys needs root->eq_classes to be set
up beforehand, not appendrels. Bacause build_index_pathkeys
doesn't work before every EC member for all relation involved is
already registered.

standard_qp_callback registers EC members in the following path
but only for the primary(?) tlist of the subquery, so EC members
for the child relations are not registered at the time.

.->make_pathekeys_sortclauses->make_pathkey_from_sortop
   ->make_pathkey_from_sortinfo->get_eclass_for_sort_expr

EC members for the child rels are registered in

 set_base_rel_sizes->set_rel_size->set_append_rel_size
   ->add_child_rel_equivalences

So trim_query_pathkeys (collect_common...) doesn't work before
set_base_rel_sizes(). These EC members needed to be registered at
least if root->query_pathkeys exists so this seems to be done in
standard_qp_callback adding a separate inheritance tree walk.

# rel->has_elcass_joins seems not working but it is not the topic
# here.

> Could you please explain me why the index information built in above
> path is not sufficient or is there any other case which I am missing?

Is the description above enough to be the explaination for the
place? Sorry for the inaccurate description.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9573a9b..546e112 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	/* indexprs is redundant since we print indextlist */
 	WRITE_NODE_FIELD(indpred);
 	WRITE_NODE_FIELD(indextlist);
+	/* cached pathkeys are omitted as indexkeys */
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c81efe9..c12d0e7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -58,6 +58,7 @@ int			geqo_threshold;
 join_search_hook_type join_search_hook = NULL;
 
 
+static List *collect_common_primary_pathkeys(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void trim_query_pathkeys(PlannerInfo * root);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
@@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
   RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
 
+/*
+ * collect_common_primary_pathkeys: Collects common unique and non-null index
+ * pathkeys from all base relations in current root.
+ */
+static List *
+collect_common_primary_pathkeys(PlannerInfo *root)
+{
+	List *result = NIL;
+	Index

Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-22 17:05:22 +0900, MauMau wrote:
> From: "Andres Freund" 
> >On 2014-07-22 10:09:04 +0900, MauMau wrote:
> >>Is there any problem if we don't output the message?
> >
> >Yes. The user won't know that possibly gigabytes worth of diskspace
> >aren't freed.
> 
> RemovePgTempFiles() frees the disk space by removing temp relation files at
> server start.

But it's not called during a crash restart.

> In addition, the temp relation files are not replicated to
> the standby server of streaming replication (this is the customer's
> case).

So?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-07-22 Thread Kyotaro HORIGUCHI
Hello,

> > Although I doubt necessity of the flexibility seeing the current
> > testing framework, I don't have so strong objection about
> > that. Nevertheless, perhaps you are appreciated to put a notice
> > on.. README or somewhere.
> Hm, well... Fine, I added it in this updated series.

Thank you for your patience:)

 After all, I have no more comment about this patch. I will mark
this as 'Ready for committer' unless no comment comes from others
for a few days.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUFFER_LOCK_EXCLUSIVE is used in ginbuildempty().

2014-07-22 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 17 Jul 2014 15:54:31 -0400, Tom Lane  wrote in 
<10710.1405626...@sss.pgh.pa.us>
> Peter Geoghegan  writes:
> > On Thu, Jul 17, 2014 at 7:47 AM, Alvaro Herrera
> >  wrote:
> >> I don't understand the point of having these GIN_EXCLUSIVE / GIN_SHARED
> >> symbols.  It's not like we could do anything different than
> >> BUFFER_LOCK_EXCLUSIVE etc instead.  It there was a GinLockBuffer() it
> >> might make more sense to have specialized symbols, but as it is it seems
> >> pointless.

I agree with you. From the eyes not specialized for each AM, of
me, the translation-only symbols didn't make me so happy.

> > It's a pattern common to the index AMs. I think it's kind of pointless
> > myself, but as long as we're doing it we might as well be consistent.
> 
> I think that to the extent that these symbols are used in APIs above the
> direct buffer-access layer, they are useful --- for example using
> BT_READ/BT_WRITE in _bt_search calls seems like a useful increment of
> readability.  GIN seems to have less of that than some of the other AMs,
> but I do see GIN_SHARE being used that way in some calls.
> 
> BTW, there's one direct usage of BUFFER_LOCK_EXCLUSIVE in the GIST code
> as well, which should probably be replaced with GIST_EXCLUSIVE if we're
> trying to be consistent.

Though I brought up this topic, this kind of consistency seems
not needed so much. If so, it seems better to be left as it is.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread MauMau

From: "Andres Freund" 

On 2014-07-22 17:05:22 +0900, MauMau wrote:
RemovePgTempFiles() frees the disk space by removing temp relation files 
at

server start.


But it's not called during a crash restart.


Yes, the comment of the function says:

* NOTE: we could, but don't, call this during a post-backend-crash restart
* cycle.  The argument for not doing it is that someone might want to 
examine

* the temp files for debugging purposes.  This does however mean that
* OpenTemporaryFile had better allow for collision with an existing temp
* file name.

But this is true if restart_after_crash = on in postgresql.conf, because the 
crash restart only occurs in that case.  However, in HA cluster, whether it 
is shared-disk or replication, restart_after_crash is set to off, isn't it?


Moreover, as the comment says, the behavior of keeping leftover temp files 
is for debugging by developers.  It's not helpful for users, isn't it?  I 
thought messages of DEBUG level is more appropriate, because the behavior is 
for debugging purposes.



In addition, the temp relation files are not replicated to
the standby server of streaming replication (this is the customer's
case).


So?


Yes, so nobody can convince serious customers that the current behavior 
makes real sense.


Could you please reconsider this?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to reach D5 in tuplesort.c 's polyphase merge algorithm?

2014-07-22 Thread 土卜皿
hi,
   I got the same result after work_mem = 64,
but I can get to D5 and D6 after using bigger data sample (at least 10
records) as Tom said!




2014-07-19 6:35 GMT+08:00 土卜皿 :

>
> 2014-07-19 6:26 GMT+08:00 Tom Lane :
>
> =?UTF-8?B?5Zyf5Y2c55q/?=  writes:
>> >   for studying polyphase merge algorithm of tuplesort.c, I use ddd and
>> > apend a table, which has a schema as follows:
>> > ...
>> > and has 36684 records, and every record is like:
>> > id  code  article  name  department
>> > 31800266\NMachault77
>>
>> > and for getting into external sort, I type the following command:
>>
>> > select * from towns order by name desc;
>>
>> > but I found it need not reach D5 and D6 during sorting,
>>
>> That doesn't sound like enough data to force it to spill to disk at all;
>> at least not unless you turn down work_mem to some very small value.
>>
>>
>
> hi, Tom
>   thanks a lot!
>
>
>>
> work_mem you said remind me one more thing I did, I tried to change BLCKSZ
> = 8192/2, and successfully compiled, but I got a error when executing
> initdb
>
> Dillon
>


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-22 19:13:56 +0900, MauMau wrote:
> From: "Andres Freund" 
> >On 2014-07-22 17:05:22 +0900, MauMau wrote:
> >>RemovePgTempFiles() frees the disk space by removing temp relation files
> >>at
> >>server start.
> >
> >But it's not called during a crash restart.
> 
> Yes, the comment of the function says:
> 
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle.  The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes.  This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> 
> But this is true if restart_after_crash = on in postgresql.conf, because the
> crash restart only occurs in that case.  However, in HA cluster, whether it
> is shared-disk or replication, restart_after_crash is set to off, isn't it?

In almost all setups I've seen it's set to on, even in HA scenarios.

> Moreover, as the comment says, the behavior of keeping leftover temp files
> is for debugging by developers.  It's not helpful for users, isn't it?  I
> thought messages of DEBUG level is more appropriate, because the behavior is
> for debugging purposes.

GRR. That doesn't change the fact that there'll be files left over after
a crash restart.

> Yes, so nobody can convince serious customers that the current behavior
> makes real sense.

I think you're making lots of noise over a trivial log message.

> Could you please reconsider this?

No. Just removing a warning isn't the way to solve this. If you want to
improve things you'll actually need to improve things not just stick
your head into the sand.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Simon Riggs
On 22 July 2014 08:49, Michael Paquier  wrote:
> On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs  wrote:
>> The block comparison facility presented earlier by Heikki would not be
>> able to be used in production systems. ISTM that it would be desirable
>> to have something that could be used in that way.
>>
>> ISTM easy to make these changes
>>
>> * optionally generate a FPW for every WAL record, not just first
>> change after checkpoint
>> full_page_writes = 'always'
>>
>> * when an FPW arrives, optionally run a check to see if it compares
>> correctly against the page already there, when running streaming
>> replication without a recovery target. We could skip reporting any
>> problems until the database is consistent
>> full_page_write_check = on
>>
>> The above changes seem easy to implement.
>>
>> With FPW compression, this would be a usable feature in production.
>>
>> Comments?
>
> This is an interesting idea, and it would be easier to use than what
> has been submitted for CF1. However, full_page_writes set to "always"
> would generate a large amount of WAL even for small records,
> increasing I/O for the partition holding pg_xlog, and the frequency of
> checkpoints run on system. Is this really something suitable for
> production?

For critical systems, yes, I think it is.

It would be possible to make that user selectable for particular
transactions or tables.

> Then, looking at the code, we would need to tweak XLogInsert for the
> WAL record construction to always do a FPW and to update
> XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
> do some extra operations in the area of
> RestoreBackupBlock/RestoreBackupBlockContents, including masking
> operations before comparing the content of the FPW and the current
> page.
>
> Does that sound right?

Yes, it doesn't look very much code because it fits well with existing
approaches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Greg Stark
If you're always going FPW then there's no point in the rest of the record.
The point here was to find problems so that users could run normally with
confidence.

The cases you might want to run in the mode you describe are the build farm
or integration testing. When treating your application on the next release
of postgres it would be nice to have tests for the replication in your
workload given the experience in 9.3.

Even without the constant full page writes a live production system could
do a FPW comparison after a FPW if it was in a consistent state. That would
give standbys periodic verification at low costs.

-- 
greg
On 22 Jul 2014 12:28, "Simon Riggs"  wrote:

> On 22 July 2014 08:49, Michael Paquier  wrote:
> > On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs 
> wrote:
> >> The block comparison facility presented earlier by Heikki would not be
> >> able to be used in production systems. ISTM that it would be desirable
> >> to have something that could be used in that way.
> >>
> >> ISTM easy to make these changes
> >>
> >> * optionally generate a FPW for every WAL record, not just first
> >> change after checkpoint
> >> full_page_writes = 'always'
> >>
> >> * when an FPW arrives, optionally run a check to see if it compares
> >> correctly against the page already there, when running streaming
> >> replication without a recovery target. We could skip reporting any
> >> problems until the database is consistent
> >> full_page_write_check = on
> >>
> >> The above changes seem easy to implement.
> >>
> >> With FPW compression, this would be a usable feature in production.
> >>
> >> Comments?
> >
> > This is an interesting idea, and it would be easier to use than what
> > has been submitted for CF1. However, full_page_writes set to "always"
> > would generate a large amount of WAL even for small records,
> > increasing I/O for the partition holding pg_xlog, and the frequency of
> > checkpoints run on system. Is this really something suitable for
> > production?
>
> For critical systems, yes, I think it is.
>
> It would be possible to make that user selectable for particular
> transactions or tables.
>
> > Then, looking at the code, we would need to tweak XLogInsert for the
> > WAL record construction to always do a FPW and to update
> > XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
> > do some extra operations in the area of
> > RestoreBackupBlock/RestoreBackupBlockContents, including masking
> > operations before comparing the content of the FPW and the current
> > page.
> >
> > Does that sound right?
>
> Yes, it doesn't look very much code because it fits well with existing
> approaches.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Simon Riggs
On 22 July 2014 12:54, Greg Stark  wrote:
> If you're always going FPW then there's no point in the rest of the record.

I think its a simple matter to mark them XLP_BKP_REMOVABLE and to skip
any optimization of remainder of WAL records.

> The point here was to find problems so that users could run normally with
> confidence.

Yes, but a full overwrite mode would provide an even safer mode of operation.

> The cases you might want to run in the mode you describe are the build farm
> or integration testing. When treating your application on the next release
> of postgres it would be nice to have tests for the replication in your
> workload given the experience in 9.3.
>
> Even without the constant full page writes a live production system could do
> a FPW comparison after a FPW if it was in a consistent state. That would
> give standbys periodic verification at low costs.

Yes, the two options I proposed are somewhat independent of each other.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread MauMau

From: "Andres Freund" 

On 2014-07-22 19:13:56 +0900, MauMau wrote:
But this is true if restart_after_crash = on in postgresql.conf, because 
the
crash restart only occurs in that case.  However, in HA cluster, whether 
it
is shared-disk or replication, restart_after_crash is set to off, isn't 
it?


In almost all setups I've seen it's set to on, even in HA scenarios.


I'm afraid that's because people don't notice the existence or purpose of 
this parameter.  The 9.1 release note says:


Add restart_after_crash setting which disables automatic server restart 
after a backend crash (Robert Haas)
This allows external cluster management software to control whether the 
database server restarts or not.


Reading this, I guess the parameter was introduced, and should be used, for 
HA environments controlled by the clusterware.  Restarting the database 
server on the same machine may fail, or the restarted server may fail again, 
due to the broken hardware components, so I guess it was considered better 
to let the clusterware determine what to do.



Moreover, as the comment says, the behavior of keeping leftover temp 
files

is for debugging by developers.  It's not helpful for users, isn't it?  I
thought messages of DEBUG level is more appropriate, because the behavior 
is

for debugging purposes.


GRR. That doesn't change the fact that there'll be files left over after
a crash restart.


Yes... that's a source of headache.  But please understand that there's a 
problem -- trying to leave temp relations just for debugging is causing a 
flood of messages, which the customer is actually concerned about.



I think you're making lots of noise over a trivial log message.


Maybe so, and I hope so.  I may be too nervous about what the customer will 
ask and/or request next.  If they request something similar to what I 
proposed here, let me consult you again.




Could you please reconsider this?


No. Just removing a warning isn't the way to solve this. If you want to
improve things you'll actually need to improve things not just stick
your head into the sand.



I have a few ideas below, but none of them seems better than the original 
proposal.  What do you think?


1. startup process deletes the catalog entries and data files of leftover 
temp relations at the end of recovery.
This is probably difficult, impossible or undesirable, because the startup 
process cannot access system catalogs.  Even if it's possible, it is against 
the developers' desire to leave temp relation files for debugging.


2. autovacuum launcher deletes the catalog entries and data files of 
leftover temp relations during its initialization.
This may be possible, but it is against the developers' desire to leave temp 
relation files for debugging.


3. Emit the "orphan temp relation" message only when the associated data 
file actually exists.
autovacuum workers check if the temp relation file is left over with stat(). 
If not, delete the catalog entry in pg_class silently.
This sounds reasonable because the purpose of the message is to notify users 
of potential disk space shortage.  In the streaming replication case, no 
data files should exist on the promoted new primary, so no messages should 
be emitted.
However, in the shared-disk HA cluster case, the temp relation files are 
left over on the shared disk, so this fix doesn't improve anything.


4. Emit the "orphan temp relation" message only when restart_after_crash is 
on.

i.e.
 ereport(restart_after_crash ? LOG : DEBUG1, ...


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Robert Haas
On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund  wrote:
>> Yes, so nobody can convince serious customers that the current behavior
>> makes real sense.
>
> I think you're making lots of noise over a trivial log message.
>
>> Could you please reconsider this?
>
> No. Just removing a warning isn't the way to solve this. If you want to
> improve things you'll actually need to improve things not just stick
> your head into the sand.

I've studied this area of the code before, and I would actually
proposed to fix this in the opposite way - namely, adopt the logic
currently used for the wraparound case, which drops the temp table,
even if wraparound is not an issue.  The current code seems to be
laboring under the impression that there's some benefit to keeping the
temporary relation around, which, as far as I can see, there isn't.
Now, you could perhaps argue that it's useful for forensics, but that
presumes that the situation where a backend fails to crash without
cleaning up its temporary relations is exciting enough to merit an
investigation, which I believe to be false.
RemoveTempRelationsCallback just does this:

AbortOutOfAnyTransaction();
StartTransactionCommand();
RemoveTempRelations(myTempNamespace);
CommitTransactionCommand();

RemoveTempRelations uses:

deleteWhatDependsOn(&object, false);

These are pretty high-level operations, and there are all kinds of
reasons they could fail.  Many of those reasons do indeed involve the
system being messed up in some way, but it's likely that the user will
know about that for other reasons anyway - e.g. if the cleanup fails
because the disk where the files are located has gone read-only at the
operating system level, things are going to go downhill in a hurry.
When the user restarts, they expect recovery - or other automatic
cleanup mechanisms - to put things right.  And normally they do: the
first backend to get the same backend ID as any orphaned temp schema
will clear it out anyway on first use - completely silently - but if
it so happens that a crashed backend had a very high backend ID and
that temp table usage is relatively uncommon, then the user gets log
spam from now and until eternity because of a problem that, in their
mind, is already fixed.  Even better, they will typically be
completely unable to connect the log spam back to the event that
triggered it, and will have no idea how to put it right.

So while I disagree with MauMau's proposed fix, I agree with him that
this error message is useless log spam.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Robert Haas
On Tue, Jul 22, 2014 at 2:55 AM, Heikki Linnakangas
 wrote:
> On 07/21/2014 10:47 PM, Anastasia Lubennikova wrote:
>>
>> Hi, hackers!
>> There are new results of my work on GSoC project "Index-only scans for
>> GIST".
>> Previous post is here:
>>
>> http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html
>>
>> Repository is
>> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
>> Patch is in attachments.
>> It includes indexonlyscan for multicolumn GiST. It works correctly -
>> results are the same with another scan plans.
>> Fetch() method is realized for box and circle opclasses
>> Improvement for circle opclass is not such distinct as for box opclass,
>> because of recheck.
>
>
> For a circle, the GiST index stores a bounding box of the circle. The new
> fetch function reverses that, calculating the radius and center of the
> circle from the bounding box.
>
> Those conversions lose some precision due to rounding. Are we okay with
> that? Floating point math is always subject to rounding errors, but there's
> a good argument to be made that it's not acceptable to get a different value
> back when the user didn't explicitly invoke any math functions.
>
> As an example:
>
> create table circle_tbl (c circle);
> create index circle_tbl_idx on circle_tbl using gist (c);
> insert into circle_tbl values ('1.23456789012345,1.23456789012345,1e300');
>
> postgres=# set enable_seqscan=off; set enable_bitmapscan=off; set
> enable_indexonlyscan=on;
> SET
> SET
> SET
> postgres=# select * from circle_tbl ;
>c
> 
>  <(0,0),1e+300>
> (1 row)
>
> postgres=# set enable_indexonlyscan=off;
> SET
> postgres=# select * from circle_tbl ;
>   c
> --
>  <(1.23456789012345,1.23456789012345),1e+300>
> (1 row)

I really don't think it's ever OK for a query to produce different
answers depending on which plan is chosen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Tom Lane
Heikki Linnakangas  writes:
> For a circle, the GiST index stores a bounding box of the circle. The 
> new fetch function reverses that, calculating the radius and center of 
> the circle from the bounding box.

> Those conversions lose some precision due to rounding. Are we okay with 
> that?

That seems like a nonstarter :-(.  Index-only scans don't have a license
to return approximations.  If we drop the behavior for circles, how much
functionality do we have left?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-22 09:39:13 -0400, Robert Haas wrote:
> On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund  wrote:
> >> Yes, so nobody can convince serious customers that the current behavior
> >> makes real sense.
> >
> > I think you're making lots of noise over a trivial log message.
> >
> >> Could you please reconsider this?
> >
> > No. Just removing a warning isn't the way to solve this. If you want to
> > improve things you'll actually need to improve things not just stick
> > your head into the sand.
> 
> I've studied this area of the code before, and I would actually
> proposed to fix this in the opposite way - namely, adopt the logic
> currently used for the wraparound case, which drops the temp table,
> even if wraparound is not an issue.

I'm absolutely not opposed to fixing this for real. I doubt it's
backpatching material, but that's something to judge when we know what
to do.

> The current code seems to be
> laboring under the impression that there's some benefit to keeping the
> temporary relation around, which, as far as I can see, there isn't.
> Now, you could perhaps argue that it's useful for forensics, but that
> presumes that the situation where a backend fails to crash without
> cleaning up its temporary relations is exciting enough to merit an
> investigation, which I believe to be false.

I think the picture here changed with the introduction of the
restart_after_crash GUC - before it it was pretty hard to investigate
anything that involved crashes. So I'm ok with changing things
around. But I think it's more complex than just doing the if
(wraparound) in do_autovacuum().

a) There very well could be a backend reconnecting to that
   backendId. Then we potentially might try to remove the temp schema
   from two backends - I'm not sure that's always going to end up going
   well. There's already a race window, but it's pretty darn unlikely to
   hit it right now because the wraparound case pretty much implies that
   nothing has used that backendid slot for a while.
   I guess we could do something like:

   LockDatabaseObject(tempschema);
   if (SearchSysCacheExists1)
  /* bailout */
   performDeletion(...);

b) I think at the very least we also need to call RemovePgTempFiles()
   during crash restart.

> RemoveTempRelationsCallback just does this:
> 
> AbortOutOfAnyTransaction();
> StartTransactionCommand();
> RemoveTempRelations(myTempNamespace);
> CommitTransactionCommand();
> 
> RemoveTempRelations uses:
> 
> deleteWhatDependsOn(&object, false);

> These are pretty high-level operations, and there are all kinds of
> reasons they could fail.

One could argue, without being very convincing from my pov, that that's
a reason not to always do it from autovacuum because it'll prevent
vacuum from progressing past the borked temporary relation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund  wrote:
>> No. Just removing a warning isn't the way to solve this. If you want to
>> improve things you'll actually need to improve things not just stick
>> your head into the sand.

> I've studied this area of the code before, and I would actually
> proposed to fix this in the opposite way - namely, adopt the logic
> currently used for the wraparound case, which drops the temp table,
> even if wraparound is not an issue.  The current code seems to be
> laboring under the impression that there's some benefit to keeping the
> temporary relation around, which, as far as I can see, there isn't.

FWIW, I agree with Andres on this.  The right response is to drop the temp
table not complain about log spam.  Or even more to the point, investigate
why it's there in the first place; perhaps there's an actual fixable bug
somewhere in there.  But deciding that orphaned temp tables are normal
is *not* an improvement IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
> Or even more to the point, investigate why it's there in the first
> place; perhaps there's an actual fixable bug somewhere in there.

I think MauMau's scenario of a failover to another database explains
their existance - there's no step that'd remove them after promoting a
standby.

So there indeed is a need to have a sensible mechanism for removing them
at some point. But it should be about removing, not ignoring them.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-22 22:18:03 +0900, MauMau wrote:
> From: "Andres Freund" 
> >On 2014-07-22 19:13:56 +0900, MauMau wrote:
> >>But this is true if restart_after_crash = on in postgresql.conf, because
> >>the
> >>crash restart only occurs in that case.  However, in HA cluster, whether
> >>it
> >>is shared-disk or replication, restart_after_crash is set to off, isn't
> >>it?
> >
> >In almost all setups I've seen it's set to on, even in HA scenarios.
> 
> I'm afraid that's because people don't notice the existence or purpose of
> this parameter.  The 9.1 release note says:

I think it's also because it's simply a good idea to keep it on in many
production scenarios. Failing over 'just' because something caused a
crash restart is often too expensive.

> >No. Just removing a warning isn't the way to solve this. If you want to
> >improve things you'll actually need to improve things not just stick
> >your head into the sand.

> I have a few ideas below, but none of them seems better than the original
> proposal.  What do you think?

> 1. startup process deletes the catalog entries and data files of leftover
> temp relations at the end of recovery.

> This is probably difficult, impossible or undesirable, because the startup
> process cannot access system catalogs.  Even if it's possible, it is against
> the developers' desire to leave temp relation files for debugging.
> 
> 2. autovacuum launcher deletes the catalog entries and data files of
> leftover temp relations during its initialization.
> This may be possible, but it is against the developers' desire to leave temp
> relation files for debugging.

I think that desire is pretty much antiquated and doesn't really count
for much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Tom Lane
Andres Freund  writes:
> On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
>> Or even more to the point, investigate why it's there in the first
>> place; perhaps there's an actual fixable bug somewhere in there.

> I think MauMau's scenario of a failover to another database explains
> their existance - there's no step that'd remove them after promoting a
> standby.

> So there indeed is a need to have a sensible mechanism for removing them
> at some point. But it should be about removing, not ignoring them.

Agreed.  Note that RemovePgTempFiles, as such, only reclaims disk space.
It does not clean out the pg_class entries, which means that just running
that at standby promotion would do nothing to get rid of autovacuum's
whining.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Andrew Dunstan


On 07/22/2014 12:24 AM, Tom Lane wrote:

According to
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2014-07-21%2022%3A36%3A55
prairiedog saw a crash in "make check" on the 9.4 branch earlier tonight;
but there's not a lot of evidence as to why in the buildfarm report,
because the postmaster log file is truncated well before where things got
interesting.  Fortunately, I was able to capture a copy of check.log
before it got overwritten by the next run.  I find that the place where
the webserver report stops matches this section of check.log:

[53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx on 
test_range_gist using gist (ir);
[53cd99bb.134a:159] LOG:  statement: insert into test_range_gist select 
int4range(g, g+10) from generate_series(1,2000) g;
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG:  statement: 
INSERT INTO num_exp_div VALUES (7,8,'-1108.80577182462841041118');
[53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
(7,9,'-107955289.045047420');
[53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
(7,9,'-58101680.954952580');

The ^@'s represent nul bytes, which I find runs of elsewhere in the file
as well.  I think they are an artifact of OS X buffering policy caused by
multiple processes writing into the same file without any interlocks.
Perhaps we ought to consider making buildfarm runs use the logging
collector by default?  But in any case, it seems uncool that either the
buildfarm log-upload process, or the buildfarm web server, is unable to
cope with log files containing nul bytes.



The data is there, just click on the "check" stage link at the top of 
the page to see it in raw form.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-22 Thread Fabrízio de Royes Mello
On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund 
wrote:
>
> > That means should I "FlushRelationBuffers(rel)" before change the
> > relpersistence?
>
> I don't think that'd help. I think what this means that you simply
> cannot change the relpersistence of the old relation before the heap
> swap is successful. So I guess it has to be something like (pseudocode):
>
> OIDNewHeap = make_new_heap(..);
> newrel = heap_open(OIDNewHeap, AEL);
>
> /*
>  * Change the temporary relation to be unlogged/logged. We have to do
>  * that here so buffers for the new relfilenode will have the right
>  * persistency set while the original filenode's buffers won't get read
>  * in with the wrong (i.e. new) persistency setting. Otherwise a
>  * rollback after the rewrite would possibly result with buffers for the
>  * original filenode having the wrong persistency setting.
>  *
>  * NB: This relies on swap_relation_files() also swapping the
>  * persistency. That wouldn't work for pg_class, but that can't be
>  * unlogged anyway.
>  */
> AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
> FlushRelationBuffers(newrel);
> /* copy heap data into newrel */
> finish_heap_swap();
>
> And then in swap_relation_files() also copy the persistency.
>
>
> That's the best I can come up right now at least.
>

Isn't better if we can set the relpersistence as an argument to
"make_new_heap" ?

I'm thinking to change the make_new_heap:

From:

 make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
   LOCKMODE lockmode)

To:

 make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
   LOCKMODE lockmode)

That way we can create the new heap with the appropriate relpersistence, so
in the swap_relation_files also copy the persistency, of course.

And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the "finish_heap_swap"
they'll be rebuild.

Thoughts?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread Andres Freund
On 2014-07-23 00:13:26 +0900, MauMau wrote:
> Hello, Robert-san, Andres-san, Tom-san,
> 
> From: "Andres Freund" 
> >a) There very well could be a backend reconnecting to that
> >  backendId. Then we potentially might try to remove the temp schema
> >  from two backends - I'm not sure that's always going to end up going
> >  well. There's already a race window, but it's pretty darn unlikely to
> >  hit it right now because the wraparound case pretty much implies that
> >  nothing has used that backendid slot for a while.
> >  I guess we could do something like:
> >
> >  LockDatabaseObject(tempschema);
> >  if (SearchSysCacheExists1)
> > /* bailout */
> >  performDeletion(...);
> >
> >b) I think at the very least we also need to call RemovePgTempFiles()
> >  during crash restart.
> 
> Thank you for showing the direction.  I'll investigate the code.  But that
> will be tomorrow as it's already past midnight.
> 
> Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not
> confident I can finish it...)  I'd really appreciate it if you could create
> the fix, if tomorrow will be late.

9.2.9 is already stamped, so no. But even without that, I don't think
that this is a change that should be rushed into the backbranches. The
risk/benefit ratio just isn't on the size of doing things hastily.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Suppress "autovacuum: found orphan temp table" message

2014-07-22 Thread MauMau

Hello, Robert-san, Andres-san, Tom-san,

From: "Andres Freund" 

a) There very well could be a backend reconnecting to that
  backendId. Then we potentially might try to remove the temp schema
  from two backends - I'm not sure that's always going to end up going
  well. There's already a race window, but it's pretty darn unlikely to
  hit it right now because the wraparound case pretty much implies that
  nothing has used that backendid slot for a while.
  I guess we could do something like:

  LockDatabaseObject(tempschema);
  if (SearchSysCacheExists1)
 /* bailout */
  performDeletion(...);

b) I think at the very least we also need to call RemovePgTempFiles()
  during crash restart.


Thank you for showing the direction.  I'll investigate the code.  But that 
will be tomorrow as it's already past midnight.


Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not 
confident I can finish it...)  I'd really appreciate it if you could create 
the fix, if tomorrow will be late.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Robert Haas
On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane  wrote:
> According to
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2014-07-21%2022%3A36%3A55
> prairiedog saw a crash in "make check" on the 9.4 branch earlier tonight;
> but there's not a lot of evidence as to why in the buildfarm report,
> because the postmaster log file is truncated well before where things got
> interesting.  Fortunately, I was able to capture a copy of check.log
> before it got overwritten by the next run.  I find that the place where
> the webserver report stops matches this section of check.log:
>
> [53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx on 
> test_range_gist using gist (ir);
> [53cd99bb.134a:159] LOG:  statement: insert into test_range_gist select 
> int4range(g, g+10) from generate_series(1,2000) g;
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\
> @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG:  statement: 
> INSERT INTO num_exp_div VALUES (7,8,'-1108.80577182462841041118');
> [53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
> (7,9,'-107955289.045047420');
> [53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
> (7,9,'-58101680.954952580');
>
> The ^@'s represent nul bytes, which I find runs of elsewhere in the file
> as well.  I think they are an artifact of OS X buffering policy caused by
> multiple processes writing into the same file without any interlocks.
> Perhaps we ought to consider making buildfarm runs use the logging
> collector by default?  But in any case, it seems uncool that either the
> buildfarm log-upload process, or the buildfarm web server, is unable to
> cope with log files containing nul bytes.
>
> Anyway, to cut to the chase, the crash seems to be from this:
>
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 
> 0)", File: "lock.c", Line: 2957)
>
> which the postmaster shortly later says this about:
>
> [53cd99b6.130e:2] LOG:  server process (PID 5230) was terminated by signal 6: 
> Abort trap
> [53cd99b6.130e:3] DETAIL:  Failed process was running: ROLLBACK PREPARED 
> 'foo1';
> [53cd99b6.130e:4] LOG:  terminating any other active server processes
>
> So there is still something rotten in the fastpath lock logic.

Gosh, that sucks.

The inconstancy of this problem would seem to suggest some kind of
locking bug rather than a flat-out concurrency issue, but it looks to
me like everything relevant is marked volatile.  But ... maybe the
problem could be in s_lock.h?  That machine is powerpc, and powerpc's
definition of S_UNLOCK looks like this:

#ifdef USE_PPC_LWSYNC
#define S_UNLOCK(lock)  \
do \
{ \
__asm__ __volatile__ (" lwsync \n"); \
*((volatile slock_t *) (lock)) = 0; \
} while (0)
#else
#define S_UNLOCK(lock)  \
do \
{ \
__asm__ __volatile__ (" sync \n"); \
*((volatile slock_t *) (lock)) = 0; \
} while (0)
#endif /* USE_PPC_LWSYNC */

I believe Andres recently reported that this isn't enough to prevent
compiler reordering; for that, the __asm__ __volatile___ would need to
be augmented with ::: "memory".  If that's correct, then the compiler
could decide to issue the volatile store before the lwsync or sync.
Then, the CPU could decide to perform the volatile store to the
spinlock before performing the update to FastPathStrongRelationLocks.
That would explain this problem.

The other possible explanation (at least, AFAICS) is that
lock_twophase_postcommit() is trying to release a strong lock count
that it doesn't actually hold.  That would indicate a
strong-lock-count handling bug upstream someplace - e.g. that the
count got released when the transaction was prepared, or somesuch.  I
certainly can't rule that out, although I don't know exactly where to
look.  We could add an assertion to AtPrepare_Locks(), just before
setting locallock->holdsStrongLockCount = FALSE, that
locallock->holdsStrongLockCount is true if and only if the lock tag
and more suggest that it ought to be.  But that's really just guessing
wildly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-22 Thread Tom Lane
Noah Misch  writes:
> Diagnose incompatible OpenLDAP versions during build and test.

Hmm.  I'm pretty sure it is not considered good style to drop AC_DEFUN
blocks right into configure.in; at least, we have never done that before.
PGAC_LDAP_SAFE should get defined somewhere in config/*.m4, instead.
I am not real sure however which existing config/ file is appropriate,
or whether we should create a new one.  Peter, any opinion?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Behavior of "OFFSET -1"

2014-07-22 Thread Tom Lane
Before 9.3, you got an error from this:

regression=# select * from tenk1 offset -1;
ERROR:  OFFSET must not be negative

But 9.3 and up ignore the negative OFFSET.  This seems to be a thinko in
my commit 1a1832eb.  limit_needed() thinks it can discard the Limit plan
node altogether, which of course prevents nodeLimit.c from complaining:

/* Executor would treat less-than-zero same as zero */
if (offset > 0)
return true;/* OFFSET with a positive value */

I don't recall the reasoning behind that comment for sure, but I imagine
I examined the behavior of ExecLimit() and failed to notice that there
was an error check in recompute_limits().

This seems to me to be a clear bug: we should reinstate the former
behavior by tightening this check so it only discards OFFSET with a
constant value of exactly 0.  Anyone think differently?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Andrew Dunstan


On 07/22/2014 10:55 AM, Andrew Dunstan wrote:


On 07/22/2014 12:24 AM, Tom Lane wrote:

According to
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2014-07-21%2022%3A36%3A55 

prairiedog saw a crash in "make check" on the 9.4 branch earlier 
tonight;

but there's not a lot of evidence as to why in the buildfarm report,
because the postmaster log file is truncated well before where things 
got

interesting.  Fortunately, I was able to capture a copy of check.log
before it got overwritten by the next run.  I find that the place where
the webserver report stops matches this section of check.log:

[53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx 
on test_range_gist using gist (ir);
[53cd99bb.134a:159] LOG:  statement: insert into test_range_gist 
select int4range(g, g+10) from generate_series(1,2000) g;
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\ 

@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG: 
statement: INSERT INTO num_exp_div VALUES 
(7,8,'-1108.80577182462841041118');
[53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
(7,9,'-107955289.045047420');
[53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
(7,9,'-58101680.954952580');


The ^@'s represent nul bytes, which I find runs of elsewhere in the file
as well.  I think they are an artifact of OS X buffering policy 
caused by

multiple processes writing into the same file without any interlocks.
Perhaps we ought to consider making buildfarm runs use the logging
collector by default?  But in any case, it seems uncool that either the
buildfarm log-upload process, or the buildfarm web server, is unable to
cope with log files containing nul bytes.



The data is there, just click on the "check" stage link at the top of 
the page to see it in raw form.






I have made a change in the upload receiver app to escape nul bytes in 
the main log field too. This will operate prospectively.


Using the logging collector would be a larger change, but we can look at 
it if this isn't sufficient.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parametric block size?

2014-07-22 Thread Alvaro Herrera
Fabien wrote:

> ISTM that a desirable and reasonably simple to implement feature
> would be to be able to set the blocksize at "initdb" time, and
> "postgres" could use the value found in the database instead of a
> compile-time one.

I think you will find it more difficult to implement than it seems at
first.  For one thing, there are several macros that depend on the block
size and the algorithms involved cannot work with dynamic sizes;
consider MaxIndexTuplesPerPage which is used inPageIndexMultiDelete()
for instance.  That value is used to allocate an array in the stack, 
but that doesn't work if the array size is dynamic.  (Actually it works
almost everywhere, but that feature is not in C89 and thus it fails on
Windows).  That shouldn't be a problem, you say, just palloc() the array
-- except that that function is called within critical sections in some
places (e.g. _bt_delitems_vacuum) and you cannot use palloc there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Portability issues in TAP tests

2014-07-22 Thread Robert Haas
On Mon, Jul 21, 2014 at 11:39 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 07/21/2014 02:40 PM, Tom Lane wrote:
>>> I had the same feeling about the Perl on RHEL6 ;-).  The TAP tests
>>> will need to be a great deal more portable than they've proven so far
>>> before they'll really be useful for anything much.  I trust we're
>>> committed to making that happen.
>
>> I'd be rather inclined to remove them from the check-world and
>> installcheck-world targets until we have dealt with the issues people
>> have identified.
>
> I think it would be reasonable to do that in the 9.4 branch, since
> it's a bit hard to argue that the TAP tests are production grade
> at the moment.  We could leave them there in HEAD.

That doesn't do developers who can't run the tests in their
environment much good: most people develop against master.  Maybe
that's the point: to get these fixed.   But it's pretty annoying that
I can no longer do 'make world'.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Behavior of "OFFSET -1"

2014-07-22 Thread Robert Haas
On Tue, Jul 22, 2014 at 12:49 PM, Tom Lane  wrote:
> Before 9.3, you got an error from this:
>
> regression=# select * from tenk1 offset -1;
> ERROR:  OFFSET must not be negative
>
> But 9.3 and up ignore the negative OFFSET.  This seems to be a thinko in
> my commit 1a1832eb.  limit_needed() thinks it can discard the Limit plan
> node altogether, which of course prevents nodeLimit.c from complaining:
>
> /* Executor would treat less-than-zero same as zero */
> if (offset > 0)
> return true;/* OFFSET with a positive value */
>
> I don't recall the reasoning behind that comment for sure, but I imagine
> I examined the behavior of ExecLimit() and failed to notice that there
> was an error check in recompute_limits().
>
> This seems to me to be a clear bug: we should reinstate the former
> behavior by tightening this check so it only discards OFFSET with a
> constant value of exactly 0.  Anyone think differently?

Not I.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parametric block size?

2014-07-22 Thread Fabien COELHO


Hello Alvaro,


ISTM that a desirable and reasonably simple to implement feature
would be to be able to set the blocksize at "initdb" time, and
"postgres" could use the value found in the database instead of a
compile-time one.


I think you will find it more difficult to implement than it seems at
first.  For one thing, there are several macros that depend on the block
size and the algorithms involved cannot work with dynamic sizes;
consider MaxIndexTuplesPerPage which is used inPageIndexMultiDelete()
for instance.  That value is used to allocate an array in the stack,
but that doesn't work if the array size is dynamic.  (Actually it works
almost everywhere, but that feature is not in C89 and thus it fails on
Windows).  That shouldn't be a problem, you say, just palloc() the array
-- except that that function is called within critical sections in some
places (e.g. _bt_delitems_vacuum) and you cannot use palloc there.


Hmmm. Thanks for your point... indeed there may be implementation 
details... not a surprise:-)



Note that I was more asking about the desirability of the feature, the 
implementation is another, although also relevant, issue. To me it is 
really desirable given the potential performance impact, but maybe we 
should not care about 10%?



About your point: if we really have to do without dynamic stack allocation 
(C99 is only 15, not ripe for adult use yet, maybe when it turns 18 or 21, 
depending on the state:-), a possible way around would be to allocate a 
larger area with some MAX_BLCKSZ with a ifdef for compilers that really 
would not support dynamic stack allocation. Moreover, it might be possible 
to hide it more or less cleanly in a macro. I had to put "-pedantic 
-Werror" to manage to get an error on dynamic stack allocation with "gcc 
-std=c89".


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Rohit Goyal
Hi All,

I am working on postgresql code and having some problem. :)

I need to create shared data structure, so that different client and
connection can update and share the state of those data structures in
memory. I planned to use top memory context but it can give me shared
structure within one session/terminal.

Please tel me how  postgresql do that and how i can do that?

Regards,
Rohit Goyal


Re: [HACKERS] Behavior of "OFFSET -1"

2014-07-22 Thread David Fetter
On Tue, Jul 22, 2014 at 12:49:37PM -0400, Tom Lane wrote:
> Before 9.3, you got an error from this:
> 
> regression=# select * from tenk1 offset -1;
> ERROR:  OFFSET must not be negative

That seems eminently sane, and should continue to error out, IM.

The only circumstance I can imagine where this could be argued not to
be is just casuistry, namely LIMIT m OFFSET -n might be argued to mean
LIMIT m-n.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-22 Thread Fabrízio de Royes Mello
On Tue, Jul 22, 2014 at 12:01 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
>
> On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund 
wrote:
> >
> > > That means should I "FlushRelationBuffers(rel)" before change the
> > > relpersistence?
> >
> > I don't think that'd help. I think what this means that you simply
> > cannot change the relpersistence of the old relation before the heap
> > swap is successful. So I guess it has to be something like (pseudocode):
> >
> > OIDNewHeap = make_new_heap(..);
> > newrel = heap_open(OIDNewHeap, AEL);
> >
> > /*
> >  * Change the temporary relation to be unlogged/logged. We have to do
> >  * that here so buffers for the new relfilenode will have the right
> >  * persistency set while the original filenode's buffers won't get read
> >  * in with the wrong (i.e. new) persistency setting. Otherwise a
> >  * rollback after the rewrite would possibly result with buffers for the
> >  * original filenode having the wrong persistency setting.
> >  *
> >  * NB: This relies on swap_relation_files() also swapping the
> >  * persistency. That wouldn't work for pg_class, but that can't be
> >  * unlogged anyway.
> >  */
> > AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
> > FlushRelationBuffers(newrel);
> > /* copy heap data into newrel */
> > finish_heap_swap();
> >
> > And then in swap_relation_files() also copy the persistency.
> >
> >
> > That's the best I can come up right now at least.
> >
>
> Isn't better if we can set the relpersistence as an argument to
"make_new_heap" ?
>
> I'm thinking to change the make_new_heap:
>
> From:
>
>  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
>LOCKMODE lockmode)
>
> To:
>
>  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
>LOCKMODE lockmode)
>
> That way we can create the new heap with the appropriate relpersistence,
so in the swap_relation_files also copy the persistency, of course.
>
> And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the "finish_heap_swap"
they'll be rebuild.
>
> Thoughts?
>

The attached patch implement my previous idea based on Andres thoughts.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] name
 ENABLE ALWAYS RULE rewrite_rule_name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( storage_parameter = value [, ... ] )
+SET TABLESPACE new_tablespace
 RESET ( storage_parameter [, ... ] )
 INHERIT parent_table
 NO INHERIT parent_table
 OF type_name
 NOT OF
 OWNER TO new_owner
-SET TABLESPACE new_tablespace
 REPLICA IDENTITY {DEFAULT | USING INDEX index_name | FULL | NOTHING}
 
 and table_constraint_using_index is:
@@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] name

 

+SET {LOGGED | UNLOGGED}
+
+ 
+  This form changes the table persistence type from unlogged to permanent or
+  from unlogged to permanent (see ).
+ 
+ 
+  Changing the table persistence type acquires an ACCESS EXCLUSIVE lock
+  and rewrites the table contents and associated indexes into new disk files.
+ 
+
+   
+
+   
 SET WITH OIDS
 
  
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b1c411a..7f497c7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	heap_close(OldHeap, NoLock);
 
 	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace, false,
+	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   OldHeap->rd_rel->relpersistence,
 			   AccessExclusiveLock);
 
 	/* Copy the heap data into the new table in the desired order */
@@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 			  LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
@@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	Datum		reloptions;
 	bool		isNull;
 	Oid			namespaceid;
-	char		relpersistence;
 
 	OldHeap = heap_open(OIDOldHeap, lockmode);
 	OldHeapDesc = RelationGetDescr(OldHeap);
@@ -636,16 +636,10 @@

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Atri Sharma
On Tuesday, July 22, 2014, Rohit Goyal  wrote:

> Hi All,
>
> I am working on postgresql code and having some problem. :)
>
> I need to create shared data structure, so that different client and
> connection can update and share the state of those data structures in
> memory. I planned to use top memory context but it can give me shared
> structure within one session/terminal.
>
> Please tel me how  postgresql do that and how i can do that?
>
> Regards,
> Rohit Goyal
>

How about making it a part of shared mem, like shared buffers?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Rohit Goyal
Hi Atri/All,

I am very new in postgresql code. Can you please help in a bit detail ortel
me how to create structure in shared memory(shared buffer).

It would be really easy for me if you can give me a code snippet or any
link to follow.

Regards,
Rohit Goyal


On Tue, Jul 22, 2014 at 8:30 PM, Atri Sharma  wrote:

>
>
> On Tuesday, July 22, 2014, Rohit Goyal  wrote:
>
>> Hi All,
>>
>> I am working on postgresql code and having some problem. :)
>>
>> I need to create shared data structure, so that different client and
>> connection can update and share the state of those data structures in
>> memory. I planned to use top memory context but it can give me shared
>> structure within one session/terminal.
>>
>> Please tel me how  postgresql do that and how i can do that?
>>
>> Regards,
>> Rohit Goyal
>>
>
> How about making it a part of shared mem, like shared buffers?
>
> Regards,
>
> Atri
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>
>


-- 
Regards,
Rohit Goyal


Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Jaime Casanova
On Tue, Jul 22, 2014 at 1:33 PM, Rohit Goyal  wrote:
> Hi Atri/All,
>
> I am very new in postgresql code. Can you please help in a bit detail ortel
> me how to create structure in shared memory(shared buffer).
>
> It would be really easy for me if you can give me a code snippet or any link
> to follow.
>

you can look at contrib/pg_stat_statements

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-22 Thread Tomas Vondra
On 28.6.2014 21:29, Pavel Stehule wrote:
> Hello
> 
> rebase for 9.5
> 
> test:
> \pset linestyle unicode \pset border 2
> \pset unicode_header_linestyle double
> 
> \l
> 
> Regards
> 
> Pavel

I did a quick review of the patch today:

* it applies cleanly to current HEAD (no failures, small offsets)
* compiles and generally seems to work just fine

Two questions:

(1) Shouldn't the new options be listed in '\?' (as possible names for
"pset")? I mean, here:

  \pset [NAME [VALUE]] set table output option
 (NAME :=
{format|border|expanded|fieldsep|fieldsep_zero|footer|null|
numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager})


(2) I noticed this piece of code:

+typedef enum unicode_linestyle
+{
+   UNICODE_LINESTYLE_SINGLE = 0, /* to make sure someone initializes this 
*/
+   UNICODE_LINESTYLE_DOUBLE = 1
+} unicode_linestyle;

Why are the values defined explicitly? These values are set by the
compiled automatically, so why set them manually? Only a few of the
other enums are defined explicitly, and most of them have to do that to
define different values (e.g. 0x01, 0x02, 0x04, ...).

I don't understand how the comment "to make sure someone initializes
this" explains the purpose?


regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane  wrote:
>> Anyway, to cut to the chase, the crash seems to be from this:
>> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 
>> 0)", File: "lock.c", Line: 2957)
>> So there is still something rotten in the fastpath lock logic.

> Gosh, that sucks.

> The inconstancy of this problem would seem to suggest some kind of
> locking bug rather than a flat-out concurrency issue, but it looks to
> me like everything relevant is marked volatile.

I don't think that you need any big assumptions about machine-specific
coding issues to spot the problem.  The assert in question is here:

/*
 * Decrement strong lock count.  This logic is needed only for 2PC.
 */
if (decrement_strong_lock_count
&& ConflictsWithRelationFastPath(&lock->tag, lockmode))
{
uint32fasthashcode = FastPathStrongLockHashPartition(hashcode);

SpinLockAcquire(&FastPathStrongRelationLocks->mutex);
Assert(FastPathStrongRelationLocks->count[fasthashcode] > 0);
FastPathStrongRelationLocks->count[fasthashcode]--;
SpinLockRelease(&FastPathStrongRelationLocks->mutex);
}

and it sure looks to me like that
"ConflictsWithRelationFastPath(&lock->tag" is looking at the tag of the
shared-memory lock object you just released.  If someone else had managed
to recycle that locktable entry for some other purpose, the
ConflictsWithRelationFastPath call might incorrectly return true.

I think s/&lock->tag/locktag/ would fix it, but maybe I'm missing
something.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-22 Thread Tom Lane
Noah Misch  writes:
>> Diagnose incompatible OpenLDAP versions during build and test.

Oh, one more thing: the Windows buildfarm members mostly don't like
the dblink test case you added.  Looks like the mechanism for finding
the shared library doesn't work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan
On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila  wrote:
> There is a mention about the race condition where it needs to move right
> in another caller (_bt_search) of _bt_moveright() as well.
>
> /*
> * Race -- the page we just grabbed may have split since we read its
> * pointer in the parent (or metapage).  If it has, we may need to
> * move right to its new sibling.  Do that.
> ..
>
> Do you think there is more to what is already mentioned on top of second
> caller which we should add or you think if it is true for both, then it
> should
> be on top of _bt_moveright()?

Well, maybe it is justified to mention it multiple times, so as to not
break the reader's train of thought. I'm not sure.

> In general, I agree with you that we should mention about any advantage
> of the algorithm we are using and especially if it is significant.  I think
> it
> will be better if can also mention how that advantage or use is realized
> in our implementation as we are already doing in README of nbtree.

Right. It seems like the nbtree README is very shy about telling us
what the point of all this extra work is. IMV that should be stated
very prominently to aid understanding. A lot of people believe that we
have to do lock coupling/"crabbing" when descending the tree. We do
not. The locks acquired when descending a B-Tree in Postgres are
pretty granular. One read buffer lock is held at a time in the process
of servicing index scans. There are times during the descent of the
tree when no buffer locks are held whatsoever. Moreover, (with some
caveats) it doesn't really matter if a stale view of the page is seen
during a descent (as it happens I've been trying to think of ways to
further take advantage of this). That's pretty cool. If you only know
one thing about how the nbtree code works, that should probably be it.

> The above indicates 2 things:
> a. L & Y doesn't need to hold read locks concurrently.
> b. Advantage of right-links at all levels and "high-key".
>
> As per my understanding, we are not following point (a) in our code,
> so what is the benefit we get by having a reference of same in README?

The major reason that we don't completely avoid read locks, is, I
suppose, the need for self-consistent pages (but also because it would
break page deletion - I'm pretty sure that L&Y don't consider page
deletion, and the page deletion logic is entirely based on the Lanin
and Shasha paper and original research, but I didn't check). I think
that the sentence "Lehman and Yao don't require read locks, but assume
that in-memory copies of tree pages are unshared" is intended to
convey the point on the self-consistency of pages. Of course, Lehman
and Yao must assume that the B-Tree is in some sense in shared memory.
Otherwise, there wouldn't be much point to their elaborate locking
protocol.  :-)

> Isn't it better if we mention how the point (b) is used in our code and
> it's advantage together rather than keeping it at top of README?

Maybe it deserves more prominent mention in the code too.

> Already README mentions in brief about right-link and how it is used
> as below:
> ".. The scan must remember the page's right-link at the time it was scanned,
> since that is the page to move right to; if we move right to the current
> right-link then we'd re-scan any items moved by a page split. ..."

This is talking about how index scans interlock against VACUUM while
going to the heap, by keeping a leaf page pinned (this prevents "super
exclusive lock" acquisition). This is after the tree has been
descended. This is really just a detail (albeit one that follows
similar principles, since pages split right and it similarly exploits
that fact), whereas the use of right links and high keys while
descending the tree, and in particular the fact that the "move right"
L&Y technique obviates the prior need for "lock coupling" is pretty
much the whole point of L&Y.

In more concrete terms, _bt_search() releases and only then acquires
read locks during a descent of the tree (by calling
_bt_relandgetbuf()), and, perhaps counterintuitively, that's just
fine.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-22 Thread Tom Lane
Peter Geoghegan  writes:
> Right. It seems like the nbtree README is very shy about telling us
> what the point of all this extra work is.

IIRC, the README was written on the assumption that you'd already read
L&Y.  If this patch is mostly about not assuming that, why not?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgkill() is not POSIX-like for exiting processes

2014-07-22 Thread Noah Misch
My new OpenLDAP test case has been breaking each MSVC buildfarm member.  Most
MinGW members are fine, though the 9.0 and 9.1 narwhal members broke.  (Newer
narwhal members have been broken long-term.)  The MSVC build system has a
mundane inability to handle a Makefile construct I used; the first attached
patch fixes that.  With that fixed, though, the test case reveals a departure
from POSIX in pgkill(), our emulation of kill(2) for Windows.  A pgkill() call
falling very close in time to process exit can easily give ERROR_BROKEN_PIPE,
and I at least once observed it give ERROR_BAD_PIPE.  pgkill() translates
those codes to EINVAL.  Only a buggy POSIX program will ever see EINVAL from
kill(2).  This situation is just like signalling a zombie, and POSIX kill(2)
returns zero for zombies.  Let's do that, as attached.

I'm inclined to back-patch this, although a quick scan didn't turn up any code
having bugs today as a result.  The change will affect occasional error
messages.  (An alternative is to change the test case code in the back
branches to treat EINVAL like success.)

On my systems, MSVC builds get ERROR_BROKEN_PIPE far more easily than MinGW
builds.  In MinGW, I ran "make -C contrib/dblink check" many times without
ever hitting the error, but "vcregress contribcheck" hit it over 99% of the
time.  I don't have a theory to explain that.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 39698ee..a9e95cd 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -331,9 +331,13 @@ sub fetchRegressOpts
if ($m =~ /^\s*REGRESS_OPTS\s*=(.*)/m)
{
 
-   # ignore options that use makefile variables - can't handle 
those
-   # ignore anything that isn't an option staring with --
-   @opts = grep { $_ !~ /\$\(/ && $_ =~ /^--/ } split(/\s+/, $1);
+   # Substitute known Makefile variables, then ignore options that 
retain
+   # an unhandled variable reference.  Ignore anything that isn't 
an
+   # option staring with "--".
+   @opts = grep {
+   s/\Q$(top_builddir)\E/\"$topdir\"/;
+   $_ !~ /\$\(/ && $_ =~ /^--/
+   } split(/\s+/, $1);
}
if ($m =~ /^\s*ENCODING\s*=\s*(\S+)/m)
{
diff --git a/src/port/kill.c b/src/port/kill.c
index 5a7d483..b33283e 100644
--- a/src/port/kill.c
+++ b/src/port/kill.c
@@ -70,13 +70,28 @@ pgkill(int pid, int sig)
return 0;
}
 
-   if (GetLastError() == ERROR_FILE_NOT_FOUND)
-   errno = ESRCH;
-   else if (GetLastError() == ERROR_ACCESS_DENIED)
-   errno = EPERM;
-   else
-   errno = EINVAL;
-   return -1;
+   switch (GetLastError())
+   {
+   case ERROR_BROKEN_PIPE:
+   case ERROR_BAD_PIPE:
+
+   /*
+* These arise transiently as a process is exiting.  
Treat them
+* like POSIX treats a zombie process, reporting 
success.
+*/
+   return 0;
+
+   case ERROR_FILE_NOT_FOUND:
+   /* pipe fully gone, so treat the process as gone */
+   errno = ESRCH;
+   return -1;
+   case ERROR_ACCESS_DENIED:
+   errno = EPERM;
+   return -1;
+   default:
+   errno = EINVAL; /* unexpected */
+   return -1;
+   }
 }
 
 #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Craig Ringer
On 07/23/2014 02:33 AM, Rohit Goyal wrote:
> 
> I am very new in postgresql code. Can you please help in a bit detail
> ortel me how to create structure in shared memory(shared buffer).
> 
> It would be really easy for me if you can give me a code snippet or any
> link to follow.
> 

There's a lot of detail on how to do this in the BDR codebase, see
contrib/bdr in
http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=summary

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Craig Ringer
On 07/23/2014 09:46 AM, Craig Ringer wrote:
> There's a lot of detail on how to do this in the BDR codebase, see
> contrib/bdr in
> http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=summary

Oh, sorry: in the bdr-next branch. Should've mentioned.

http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=tree;f=contrib/bdr;h=fad1aa59a15724deb98f9b923d84f0ce818afc1f;hb=refs/heads/bdr-next

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IS NOT DISTINCT FROM + Indexing

2014-07-22 Thread Jonathan S. Katz
On Jul 22, 2014, at 12:40 AM, Tom Lane  wrote:

> "Jonathan S. Katz"  writes:
>> On Jul 21, 2014, at 9:51 PM, Tom Lane  wrote:
>>> The short reason why not is that it's not an operator (where "operator"
>>> is defined as "something with a pg_operator entry"), and all our indexing
>>> infrastructure is built around the notion that indexable clauses are of
>>> the form "indexed_column indexable_operator comparison_value".
> 
>> What got me thinking this initially problem is that I know "IS NULL" is 
>> indexable and I was unsure of how adding "IS NOT DISTINCT FROM" would be too 
>> different from that - of course, this is from my perspective from primarily 
>> operating on the surface.  It sounds like the IS NULL work is in the btree 
>> code?
> 
> We hacked in IS [NOT] NULL as a potentially indexable construct, but the
> key thing that made that possible without major redesign is that IS [NOT]
> NULL is datatype independent, so there's no need to identify any
> particular underlying operator or opclass.  I'm not sure what we'd do to
> handle IS [NOT] DISTINCT FROM, but that particular approach ain't gonna
> cut it.
> 
> Another point is that people are unlikely to be satisfied with planner
> optimization for IS NOT DISTINCT FROM that doesn't support it as a join
> clause (i.e., tab1.col1 IS NOT DISTINCT FROM tab2.col2); which is an issue
> that doesn't arise for IS [NOT] NULL, as it has only one argument.  So
> that brings you into not just indexability but hashing and merging
> support.  I hasten to say that that doesn't necessarily have to happen
> in a version-zero patch; but trying to make IS NOT DISTINCT FROM into
> a first-class construct is a big project.

Well that definitely answers "how hard would it be." - before embarking on 
something laborious (as even just indexing is nontrivial), I think it would be 
good to figure out how people are using IS [NOT] DISTINCT FROM and if there is 
interest in having it be indexable, let alone used in a JOIN optimization.  It 
could become a handy tool to simplify the SQL in queries that are returning a 
lot of NULL / NOT NULL data mixed together.

Jonathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-22 Thread Amit Kapila
On Wed, Jul 23, 2014 at 5:58 AM, Peter Geoghegan  wrote:
> On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila 
wrote:
> > The above indicates 2 things:
> > a. L & Y doesn't need to hold read locks concurrently.
> > b. Advantage of right-links at all levels and "high-key".
> >
> > As per my understanding, we are not following point (a) in our code,
> > so what is the benefit we get by having a reference of same in README?
>
> The major reason that we don't completely avoid read locks, is, I
> suppose, the need for self-consistent pages (but also because it would
> break page deletion - I'm pretty sure that L&Y don't consider page
> deletion, and the page deletion logic is entirely based on the Lanin
> and Shasha paper and original research, but I didn't check). I think
> that the sentence "Lehman and Yao don't require read locks, but assume
> that in-memory copies of tree pages are unshared" is intended to
> convey the point on the self-consistency of pages. Of course, Lehman
> and Yao must assume that the B-Tree is in some sense in shared memory.
> Otherwise, there wouldn't be much point to their elaborate locking
> protocol.  :-)

Okay, but how does this justify to add below new text in README.
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling).

Actually I think putting it can lead to inconsistency in the README.
Currently it indicates that our algorithm is different from L&Y w.r.t taking
Read Locks and has given explanation for same.

> > Isn't it better if we mention how the point (b) is used in our code and
> > it's advantage together rather than keeping it at top of README?
>
> Maybe it deserves more prominent mention in the code too.
>
> > Already README mentions in brief about right-link and how it is used
> > as below:
> > ".. The scan must remember the page's right-link at the time it was
scanned,
> > since that is the page to move right to; if we move right to the current
> > right-link then we'd re-scan any items moved by a page split. ..."
>
> This is talking about how index scans interlock against VACUUM while
> going to the heap, by keeping a leaf page pinned (this prevents "super
> exclusive lock" acquisition). This is after the tree has been
> descended. This is really just a detail (albeit one that follows
> similar principles, since pages split right and it similarly exploits
> that fact), whereas the use of right links and high keys while
> descending the tree, and in particular the fact that the "move right"
> L&Y technique obviates the prior need for "lock coupling" is pretty
> much the whole point of L&Y.
>
> In more concrete terms, _bt_search() releases and only then acquires
> read locks during a descent of the tree (by calling
> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
> fine.

So don't you think that it needs bit more explanation than you have
quoted in below text.
+ The addition of right-links at all levels, as well as the
+ addition of a page "high key" allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent).

Basically I think it will be better if you can explain in bit more detail
that
how does "right-links at all levels and high-key" helps to detect and
recover from concurrent page splits.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan
On Tue, Jul 22, 2014 at 5:39 PM, Tom Lane  wrote:
> IIRC, the README was written on the assumption that you'd already read
> L&Y.  If this patch is mostly about not assuming that, why not?

L&Y made the same mistake that the authors of most influential papers
make - they never get around to telling the reader why they should
bother to read it. The paper is over 30 years old, and we now know
that it's very influential, and the reasons why. I think that both the
nbtree README and L&Y would be a lot more approachable with a high
level introduction (arguably L&Y attempt this, but the way they go
about it seems impenetrable, mostly consisting of esoteric references
to other papers). Surely making that code more approachable is a
worthy goal.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inconsistencies of service failure handling on Windows

2014-07-22 Thread Michael Paquier
Hi all,

While playing on Windows with services, I noticed an inconsistent behavior
in the way failures are handled when using a service for a Postgres
instance.

Let's assume that there is a service called postgres that has been
registered:
$ psql -At -c 'select version()'
PostgreSQL 9.5devel, compiled by Visual C++ build 1600, 64-bit
$ tasklist.exe -svc -FI "SERVICES eq postgres"

Image Name PID Services
= 

pg_ctl.exe 556 postgres

When pg_ctl is directly killed, service manager is able to detect a failure
correctly.
$ taskkill.exe -PID 556 -F
SUCCESS: The process with PID 556 has been terminated.
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 1  STOPPED
WIN32_EXIT_CODE: 1067  (0x42b)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
In this case 1067 means that the process left unexpectedly. Note that at
this point the Postgres instance is still running but we can use the
failure callback to run a script that could do some cleanup before
restarting properly the service.

However when a backend process is directly killed something different
happens.
$ tasklist.exe -FI "IMAGENAME eq postgres.exe"

Image Name PID Session NameSession#Mem Usage
=   === 
postgres.exe  2088 Services   0 17,380 K
postgres.exe  2132 Services   0  4,400 K
postgres.exe  2236 Services   0  5,064 K
postgres.exe  1524 Services   0  6,304 K
postgres.exe  2084 Services   0  9,200 K
postgres.exe  2384 Services   0  5,968 K
postgres.exe  2652 Services   0  4,500 K
postgres.exe  2116 Services   0  4,384 K
$ taskkill.exe -PID 2084 -F
SUCCESS: The process with PID 2084 has been terminated.
After that some processes remain:
$ tasklist.exe -FI "IMAGENAME eq postgres.exe"
Image Name PID Session NameSession#Mem Usage
=   === 
postgres.exe  2088 Services   0  5,708 K
postgres.exe  2132 Services   0  4,400 K

Processes that are immediately taken down when attempting a connection to
the server. Note that before attempting any connections service is
considered as running normally:
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 4  RUNNING
(STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
$ psql
psql: could not connect to server: Connection refused (0x274D/10061)
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x274D/10061)
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
$ tasklist.exe -FI "IMAGENAME eq postgres.exe"
INFO: No tasks are running which match the specified criteria.

But now service has stopped, and it is not considered as having failed:
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 1  STOPPED
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
This seems like an inconsistent behavior in error detection.

I am guessing that pg_ctl is not able to track appropriately failures that
are happening on postgres side. But are there things we could do to improve
the failure detection here?

Regards,
-- 
Michael


Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-22 Thread Pavel Stehule
Hi Tomas

2014-07-22 23:20 GMT+02:00 Tomas Vondra :

> On 28.6.2014 21:29, Pavel Stehule wrote:
> > Hello
> >
> > rebase for 9.5
> >
> > test:
> > \pset linestyle unicode \pset border 2
> > \pset unicode_header_linestyle double
> >
> > \l
> >
> > Regards
> >
> > Pavel
>
> I did a quick review of the patch today:
>
> * it applies cleanly to current HEAD (no failures, small offsets)
> * compiles and generally seems to work just fine
>
> Two questions:
>
> (1) Shouldn't the new options be listed in '\?' (as possible names for
> "pset")? I mean, here:
>
>   \pset [NAME [VALUE]] set table output option
>  (NAME :=
> {format|border|expanded|fieldsep|fieldsep_zero|footer|null|
> numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager})
>
>
fixed


>
> (2) I noticed this piece of code:
>
> +typedef enum unicode_linestyle
> +{
> +   UNICODE_LINESTYLE_SINGLE = 0, /* to make sure someone initializes
> this */
> +   UNICODE_LINESTYLE_DOUBLE = 1
> +} unicode_linestyle;
>
> Why are the values defined explicitly? These values are set by the
> compiled automatically, so why set them manually? Only a few of the
> other enums are defined explicitly, and most of them have to do that to
> define different values (e.g. 0x01, 0x02, 0x04, ...).
>

this is useless - I removed it.


>
> I don't understand how the comment "to make sure someone initializes
> this" explains the purpose?
>

copy/paste error :( - removed

updated version is in attachment

Regards

Pavel


>
>
> regards
> Tomas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
commit fb99f3b1e12d7dfb203b70187e63647e5d0a674d
Author: Pavel Stehule 
Date:   Wed Jul 23 07:30:46 2014 +0200

second version - minor changes: help, remove bogus comment and not necessary exact enum specification

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fa0d6f2..fc8a503 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2294,6 +2294,42 @@ lo_import 152801
   
   
   
+
+  
+  unicode_border_style
+  
+  
+  Sets the border line drawing style to one
+  of single or  double
+  This option only affects the unicode
+  linestyle
+  
+  
+  
+
+  
+  unicode_column_style
+  
+  
+  Sets the column line drawing style to one
+  of single or  double
+  This option only affects the unicode
+  linestyle
+  
+  
+  
+
+  
+  unicode_header_style
+  
+  
+  Sets the header line drawing style to one
+  of single or  double
+  This option only affects the unicode
+  linestyle
+  
+  
+  
 
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 161de75..c0a09b1 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1054,6 +1054,9 @@ exec_command(const char *cmd,
 "footer", "format", "linestyle", "null",
 "numericlocale", "pager", "recordsep",
 "tableattr", "title", "tuples_only",
+"unicode_border_linestyle",
+"unicode_column_linestyle",
+"unicode_header_linestyle",
 NULL
 			};
 
@@ -2248,6 +2251,55 @@ _align2string(enum printFormat in)
 	return "unknown";
 }
 
+/*
+ * Parse entered unicode linestyle. Returns true, when entered string is
+ * known linestyle: single, double else returns false.
+ */
+static bool 
+set_unicode_line_style(const char *value, size_t vallen, unicode_linestyle *linestyle)
+{
+	if (pg_strncasecmp("single", value, vallen) == 0)
+		*linestyle = UNICODE_LINESTYLE_SINGLE;
+	else if (pg_strncasecmp("double", value, vallen) == 0)
+		*linestyle = UNICODE_LINESTYLE_DOUBLE;
+	else
+		return false;
+
+	return true;
+}
+
+static const char *
+_unicode_linestyle2string(int linestyle)
+{
+	switch (linestyle)
+	{
+		case UNICODE_LINESTYLE_SINGLE:
+			return "single";
+			break;
+		case UNICODE_LINESTYLE_DOUBLE:
+			return "double";
+			break;
+	}
+	return "unknown";
+}
+
+static const char *
+_linestyle2string(linestyle_type line_style)
+{
+	switch (line_style)
+	{
+		case LINESTYLE_ASCII:
+			return "ascii";
+			break;
+		case LINESTYLE_OLD_ASCII:
+			return "old-ascii";
+			break;
+		case LINESTYLE_UNICODE:
+			return "unicode";
+			break;
+	}
+	return "unknown";
+}
 
 bool
 do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
@@ -2292,11 +2344,11 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 		if (!value)
 			;
 		else if (pg_strncasecmp("ascii", value, vallen) == 0)
-			popt->topt.line_style = &pg_asciiformat;
+			popt->topt.line_style = LINESTYLE_ASCII;
 		else if (pg_strncasecmp("old-ascii", value, vallen) == 0)
-			popt->topt.li

Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan
On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila  wrote:
> Okay, but how does this justify to add below new text in README.
> + Even with these read locks, Lehman and Yao's approach obviates the
> + need of earlier schemes to hold multiple read locks concurrently when
> + descending the tree as part of servicing index scans (pessimistic lock
> + coupling).
>
> Actually I think putting it can lead to inconsistency in the README.
> Currently it indicates that our algorithm is different from L&Y w.r.t taking
> Read Locks and has given explanation for same.

Not really. Firstly, that sentence acknowledges that there are read
locks where L&Y assume there will not be. "Even with these read locks"
references the first paragraph, where it is stated the Postgres
B-Trees still acquire read locks while descending the tree. Secondly,
I'm pretty sure that even Lehman and Yao realized that their apparent
assumption that real implementations would not require read locks
isn't realistic. Their handling of deletion seems perfunctory to me.
They say "In situations where excessive deletions cause the storage
utilization of tree nodes to be unacceptably low, a batch
reorganization or an underflow operation which locks the entire tree
can be performed". I'm pretty sure that that sounded almost as bad in
1980 as it does now. We don't have a "not quite L&Y" implementation
just because there are single read locks acquired while descending the
tree. Prior schemes needed multiple *concurrent* exclusive locks.
B-Trees were around for about 10 years before L&Y.

There is reason to think that pretty much every practical
implementation uses read locks for many years, because there is a well
received 2001 paper [1] that describes a scheme where L&Y style B-link
trees can *actually* be made to not require read locks, which
discusses things like caching effects on contemporary hardware - it
involves playing tricks with detecting and recovering from page level
inconsistencies, IIRC. Furthermore, it references a scheme from the
late 90s involving local copies of B-Link pages. I thought about
pursuing something like that myself, but the cost of "latching"
(buffer locking) B-Trees in PostgreSQL is likely to be reduced before
too long anyway, which makes the general idea seem unenticing right
now.

> Basically I think it will be better if you can explain in bit more detail
> that
> how does "right-links at all levels and high-key" helps to detect and
> recover from concurrent page splits.

You might be right about that - perhaps I should go into more detail.

[1] http://www.vldb.org/conf/2001/P181.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TODO] Process pg_hba.conf keywords as case-insensitive

2014-07-22 Thread Viswanatham kirankumar
>On 16 July 2014 23:12, Tom Lane wrote
>>Christoph Berg  writes:
>> Re: Viswanatham kirankumar 2014-07-16 
>> 
>>> Attached patch is implementing following TODO item Process 
>>> pg_hba.conf keywords as case-insensitive

>> Hmm. I see a case for accepting "ALL" (as in hosts.allow(5)), so +1 on 
>> that, but I don't think the other keywords like "host" and "peer"
>> should be valid in upper case.

> I think the argument was that SQL users are accustomed to thinking that 
> keywords are 
> case-insensitive.  It makes sense to me that we should adopt that same 
> convention in pg_hba.conf.

>Re-reading the original thread, there was also concern about whether 
>we should try to make quoting/casefolding behave more like it does in SQL, 
>specifically for matching pg_hba.conf items to SQL identifiers (database and 
>role names).  
>This patch doesn't seem to have addressed that part of it, but I think we need 
>to think those
>things through before we just do a blind s/strcmp/pg_strcasecmp/g.  Otherwise 
>we might
>find that we've added ambiguity that will give us trouble when we do try to 
>fix that.

I had updated as per you review comments

1) database and role names behave similar to SQL identifiers (case-sensitive / 
case-folding).

2) users and user-groups only requires special handling and behavior as follows
Normal user :
  A. unquoted ( USER ) will be treated as user ( downcase ).
  B. quoted  ( "USeR" )  will be treated as USeR (case-sensitive).
  C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not 
be considered as user-group) and case-sensitive as string is quoted.
   User Group :
  A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
  B. plus quoted ( +"UserGROUP"  ) will be treated as +UserGROUP 
(case-sensitive).

3) Host name is not a SQL object so it will be treated as case-sensitive 
   except for all, samehost, samenet are considered as keywords. 
   For these user need to use quotes to differentiate between hostname and 
keywords.

4) All the fixed keywords mention in pg_hba.conf and Client Authentication 
section will be considered as keywords
Eg: host, local, hostssl etc..

Thanks & Regards,
VISWANATHAM  KIRAN KUMAR
HUAWEI TECHNOLOGIES INDIA PVT. LTD.



pg_hba.conf_keywords_as_case-insensitive_v2.patch
Description: pg_hba.conf_keywords_as_case-insensitive_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-22 Thread Tomas Vondra
On 23 Červenec 2014, 7:36, Pavel Stehule wrote:
>
> updated version is in attachment

OK, thanks. The new version seems OK to me.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timeout of pg_receivexlog --status-interval

2014-07-22 Thread furuyao
> >> Hi,
> >>
> >> At 1047 line of receivelog.c:CopyStreamPoll(), we set NULL to
> >> timeoutptr variable.
> >> if the value of timeoutprt is set NULL then the process will wait
> >> until can read socket using by select() function as following.
> >>
> >> if (timeout_ms < 0)
> >> timeoutptr = NULL;
> >> else
> >> {
> >> timeout.tv_sec = timeout_ms / 1000L; timeout.tv_usec =
> >> (timeout_ms % 1000L) * 1000L;
> >> timeoutptr = &timeout;
> >> }
> >>
> >> ret = select(PQsocket(conn) + 1, &input_mask, NULL, NULL,
> >> timeoutptr);
> >>
> >> But the 1047 line of receivelog.c is never executed because the value
> >> of timeout_ms is NOT allowed less than 0 at CopyStreamReceive which
> >> is only one function calls CopyStreamPoll().
> >> The currently code, if we specify -s to 0 then CopyStreamPoll()
> >> function is never called.
> >> And the pg_receivexlog will be execute PQgetCopyData() and failed,
> in
> >> succession.
> >
> > Thanks for reporting this! Yep, this is a problem.
> >
> >> I think that it is contradiction, and should execute select()
> >> function with NULL of fourth argument.
> >> the attached patch allows to execute select() with NULL, i.g.,
> >> pg_receivexlog.c will wait until can  read socket without timeout,
> if
> >> -s is specified to 0.
> >
> > Your patch changed the code so that CopyStreamPoll is called even when
> > the timeout is 0. I don't agree with this change because the
> > timeout=0 basically means that the caller doesn't request to block and
> > there is no need to call CopyStreamPoll in this case. So I'm thinking
> > to apply the attached patch. Thought?
> >
> 
> Thank you for the response.
> I think this is  better.
> 
> One another point about select() function, I think that they are same
> behavior between the fifth argument is NULL and 0(i.g. 0 sec).
> so I think that it's better to change the CopyStreamPoll() as followings.
> 
> @@ -1043,7 +1043,7 @@ CopyStreamPoll(PGconn *conn, long timeout_ms)
> FD_ZERO(&input_mask);
> FD_SET(PQsocket(conn), &input_mask);
> 
> -   if (timeout_ms < 0)
> +   if (timeout_ms <= 0)
> timeoutptr = NULL;
> else
> {
> 
> Please give me feed back.

I have no problem with either of the suggestions, if we specify -s to 0.

However, the fix of CopyStreamPoll(), I can't choose the route which doesn't 
carry out select().

I have proposed a patch that was in reference to walreceiver, 
there is a logic to continuously receive messages as walreceiver in that patch, 
and the route which doesn't carry out select() is necessary for it.

I think that a condition change of CopyStreamReceive() is better from 
expansibility. Thought?

Regards,

--
Furuya Osamu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers