Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-21 Thread James Cloos
>>>>> "TL" == Tom Lane <t...@sss.pgh.pa.us> writes:

TL> The question to be asked is whether there is still anybody out there
TL> using float timestamps.

Gentoo's ebuild includes:

   $(use_enable !pg_legacytimestamp integer-datetimes) \

meaning that by default --enable-integer-datetimes is passed to configure,
but if the pg_legacytimestamp use flag is set, then --disable-integer-datetimes
is passed instead.

They document it as:


Use double precision floating-point numbers instead of 64-bit
integers for timestamp storage.


Ie, w/o any kind of deprecation notice.

I don't know how many (how few?) add pg_legacytimestamp to USE when
merging postgresql.  But it is still available as of 9.6 and also
with their live build of git://git.postgresql.org/git/postgresql.git.

-JimC
-- 
James Cloos <cl...@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multiple synchronous_standby_names rules

2017-01-11 Thread James Sewell
Hello,

When working with a production (P) and a DR (D) environment it is often a
requirement to be able to protect against data loss when promoting within a
site, and also when losing A and promoting a master at D.

The current synchronous_standby_names do not allow this.

In a simple example we could take the following nodes:

P1 (current master), P2, P3
D1, D2, D3

Where P1 is replicating to (P2, P3, D1, D2, D3).

The closest synchronous_standby_names  setting you could get to my use case
would be:

1 (D1, D2, D3)

This would allow the loss of either site without losing data - however it
would not allow promotion within site P from P1 -> (P2 | P3)  without the
potential for data loss.

What is needed to support this is the ability to configure Px with
something like:

 1 (P1, P2, P3), 1 (D1, D2, D3)

Would there be any appetite for this - or would it be seen as over
complication of the current rules?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

-- 

--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] One process per session lack of sharing

2016-07-15 Thread james

On 15/07/2016 09:28, Craig Ringer wrote:
I don't think anyone's considering moving from multi-processing to 
multi-threading in PostgreSQL. I really, really like the protection 
that the shared-nothing-by-default process model gives us, among other 
things.


As I understand it, the main issue is that it is hard to integrate 
extensions that use heavyweight runtimes and are focussed on isolation 
within a virtual machine.  Its not just


Perhaps it would be possible for the postmaster (or a delegate process) 
to host such a runtime, and find a way for a user process that wants to 
use such a runtime to communicate with it, whether by copying function 
parameters over RPC or by sharing some of its address space explicitly 
to the runtime to operate on directly.


Such a host delegate process could be explicitly built with multithread 
support and not 'infect' the rest of the code with its requirements.


Using granular RPC is nice for isolation but I am concerned that the 
latencies might be high.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-03-21 Thread James Sewell
Good news!

On Tuesday, 22 March 2016, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 22 March 2016 at 02:35, Robert Haas <robertmh...@gmail.com
> <javascript:;>> wrote:
> > I have committed this after changing some of the comments.
> >
> > There might still be bugs ... but I don't see them.  And the speedups
> > look very impressive.
> >
> > Really nice work, David.
>
> Thanks for that, and thank you for taking the time to carefully review
> it and commit it.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
> <javascript:;>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Choosing parallel_degree

2016-03-20 Thread James Sewell
OK cool, thanks.

Can we remove the minimum size limit when the per table degree setting is
applied?

This would help for tables with 2  - 1000 pages combined with a high CPU
cost aggregate.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Sun, Mar 20, 2016 at 11:23 PM, David Rowley <david.row...@2ndquadrant.com
> wrote:

> On 18 March 2016 at 10:13, James Sewell <james.sew...@lisasoft.com> wrote:
> > This does bring up an interesting point I don't quite understand though.
> If I run parallel agg on a table with 4 rows with 2 workers will it run on
> two workers (2 rows each) or will the first one grab all 4 rows?
> It works on a per page basis, workers just each grab the next page to
> be scanned from a page counter that sits in shared memory, the worker
> just increments the page number, releases the lock on the counter and
> scans that page.
>
> See heap_parallelscan_nextpage()
>
> So the answer to your question is probably no. At least not unless the
> the page only contained 2 rows.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Choosing parallel_degree

2016-03-19 Thread James Sewell
On Thu, Mar 17, 2016 at 5:05 AM, Julien Rouhaud <julien.rouh...@dalibo.com>
wrote:

>
> attached v3 drops the GUC part.
>

This looks good good. I do think that some threshold control would be good
in the long term - but you are right Robert it just feels strange.

Maybe once the final formula is implemented in 9.7+ and this gets some real
world use cases it can be revisited?

One thing I really, really like about the way the new patch works is that
you can set parallel_degree on an inheritance parent, then that will set
the minimum workers for all associated children (when accessing from the
parent).

Currently this patch will not fire on small tables even when
parallel_degree is set, can we fix this by adding a check for
ref->parallel_degree  to the table size condition?

Cheers,
James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-19 Thread James Sewell
Hi again,

This is probably me missing something, but is there a reason parallel
aggregate doesn't seem to ever create append nodes containing Index scans?

SET random_page_cost TO 0.2;
SET max_parallel_degree TO 8;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
   QUERY PLAN
-
 Finalize GroupAggregate  (cost=310596.32..310598.03 rows=31 width=16)
   Group Key: view_time_day
   ->  Sort  (cost=310596.32..310596.79 rows=186 width=16)
 Sort Key: view_time_day
 ->  Gather  (cost=310589.00..310589.31 rows=186 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=310589.00..310589.31
rows=31 width=16)
 Group Key: view_time_day
 ->  Parallel Seq Scan on base  (cost=0.00..280589.00
rows=600 width=12)


SET max_parallel_degree TO 0;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
QUERY PLAN
---
 GroupAggregate  (cost=0.56..600085.92 rows=31 width=16)
   Group Key: view_time_day
   ->  Index Only Scan using base_view_time_day_count_i_idx on base
 (cost=0.56..450085.61 rows=3000 width=12)
(3 rows)


Cheers,


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Mar 17, 2016 at 8:08 AM, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 17 March 2016 at 01:29, Robert Haas <robertmh...@gmail.com> wrote:
> > On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> Isn't it better to call it as Parallel Aggregate instead of Partial
> >> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> >> changed to Parallel Seq Scan, so I am not able to think why it is
> better to
> >> call Partial incase of Aggregates.
> >
> > I think partial is the right terminology.  Unlike a parallel
> > sequential scan, a partial aggregate isn't parallel-aware and could be
> > used in contexts having nothing to do with parallelism.  It's just
> > that it outputs transition values instead of a finalized value.
>
> +1  the reason the partial aggregate patches have been kept separate
> from the parallel aggregate patches is that partial aggregate will
> serve for many other purposes. Parallel Aggregate is just one of many
> possible use cases for this, so it makes little sense to give it a
> name according to a single use case.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Choosing parallel_degree

2016-03-19 Thread James Sewell
Hey,

I think are definitely use cases for using parallel agg  on a small table
when the time for each agg operation is very high. PostGIS can be used to
create many examples of low row count and table size but high CPU
operations.

This does bring up an interesting point I don't quite understand though. If
I run parallel agg on a table with 4 rows with 2 workers will it run on two
workers (2 rows each) or will the first one grab all 4 rows?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Mar 17, 2016 at 10:30 PM, Julien Rouhaud <julien.rouh...@dalibo.com>
wrote:

> On 17/03/2016 12:21, David Rowley wrote:
> > On 18 March 2016 at 00:13, Julien Rouhaud <julien.rouh...@dalibo.com>
> wrote:
> >> With the current threshold, you need a table bigger than 8 MB to be able
> >> to force parallel workers. I'm not sure there'll be benefits for
> >> multiple workers on a table smaller than 8 MB, since setting up all the
> >> parallel stuff takes time.
> >
> > It would be really nice if it were possible to drop the setting really
> > low, so that combined with a low parallel_setup_cost we could enable
> > parallel query on small tables in the regression test suite.
> >
> >
>
> Indeed. That could also be a use case for moving parallel_threshold to a
> GUC, but not sure what'd be best.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Choosing parallel_degree

2016-03-15 Thread James Sewell
On Wed, Mar 16, 2016 at 11:26 AM, Julien Rouhaud <julien.rouh...@dalibo.com>
wrote:

>
> I'm not too familiar with parallel planning, but I tried to implement
> both in attached patch. I didn't put much effort into the
> parallel_threshold GUC documentation, because I didn't really see a good
> way to explain it. I'd e happy to improve it if needed. Also, to make
> this parameter easier to tune for users, perhaps we could divide the
> default value by 3 and use it as is in the first iteration in
> create_parallel_path() ?
>
> Also, global max_parallel_degree still needs to be at least 1 for the
> per table value to be considered.
>
>
All applies and works from my end.

Is the max_parallel_degree per table of much use here? It allows the max
number of workers per table to be set - but it's still bound by the same
formula (now from the GUC). So in reality it's only really useful for
limiting the number of workers, not raising it.

Would there be a common use case for limiting parallelism on a subset of
tables in a database you've explicitly set to have a higher amount
of parallel operations via the GUC? I struggle to think of one?

I think in practicality the reverse would be more common, you'd want to set
certain tables to a starting point of a certain number of workers (and ramp
up to more if the formula allowed it). You could set this to 0 for
never use parallel
agg on this table.

Another option is to allow access to the the threshold multiplier
(currently hard coded to 3) per table - but this might become pretty hard
to explain succinctly in the documentation.

Cheers,
James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Choosing parallel_degree

2016-03-14 Thread James Sewell
Thanks David,

Eventually it would be great to take into account the cost of the function
doing the agg (pg_proc.procost, which is a multiple of CPU units).

This would allow people to mark specific aggregations as needing more CPU
power, therefore needing more workers per page (or should it be tuple in
this case?).

In the meantime some way to manually influence this would be good. I just
did some testing (on an 8VCPU machine) with a 139MB table, which gets 3
workers currently.

For a count(*) I get a time of 131.754 ms. If I increase this to 8 workers
I get around 86.193 ms.

Obviously this doesn't mean much as YMMV - but it does show that the
ability to manually adjust the scaling would be great, especially in
data warehouse or reporting environments.

I did want to test with some really slow aggs, but even when I take out the
small table test in create_parallel_paths I can't seem to get a parallel
plan for a tiny table. Any idea on why this would be David?


Cheers,













James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Tue, Mar 15, 2016 at 12:25 PM, David Rowley <david.row...@2ndquadrant.com
> wrote:

> Over in [1] James mentioned about wanting more to be able to have more
> influence over the partial path's parallel_degree decision.  At risk
> of a discussion on that hijacking the parallel aggregate thread, I
> thought I'd start this for anyone who would want to discuss making
> changes to that.
>
> I've attached a simple C program which shows the parallel_degree which
> will be chosen at the moment. For now it's based on the size of the
> base relation. Perhaps that will need to be rethought later, perhaps
> based on costs. But I just don't think it's something for 9.6.
>
> Here's the output of the C program.
>
> For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
> For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
> For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
> For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
> For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
> For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
> For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
> For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
> For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
> For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
> For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
> For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
> For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
> For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163
> GB)
>
> [1]
> http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iq64ujmvrpycs6d4i9-wepbusq1sq+...@mail.gmail.com
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell
On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas  wrote:

>
> I kind of doubt this would work well, but somebody could write a patch
> for it and try it out.


OK I'll give this a go today and report back.

Would the eventual plan be to use pg_proc.procost for the functions from
each aggregate concerned? If so I might have a peek at that too, although I
imagine I won't get far.

Cheers,

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-14 Thread James Sewell
On Tuesday, 15 March 2016, Robert Haas  wrote:
>
> > Does the cost of the aggregate function come into this calculation at
> > all? In PostGIS land, much smaller numbers of rows can generate loads
> > that would be effective to parallelize (worker time much >> than
> > startup cost).
>
> Unfortunately, no - only the table size.  This is a problem, and needs
> to be fixed.  However, it's probably not going to get fixed for 9.6.
> :-(
>

Any chance of getting a GUC (say min_parallel_degree) added to allow
setting the initial value of parallel_degree, then changing the small
relation check to also pass if parallel_degree > 1?

That way you could set min_parallel_degree on a query by query basis if you
are running aggregates which you know will take a lot of CPU.

I suppose it wouldn't make much sense at all to set globally though, so it
could just confuse matters.

Cheers,

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell
On Mon, Mar 14, 2016 at 3:05 PM, David Rowley <david.row...@2ndquadrant.com>
wrote:

>
> Things to try:
> 1. alter table a add column ts_date date; update a set ts_date =
> date_trunc('DAY',ts); vacuum full analyze ts;
> 2. or, create index on a (date_trunc('DAY',ts)); analyze a;
> 3. or for testing, set the work_mem higher.
>
>
Ah, that makes sense.

Tried with a BTREE index, and it works as perfectly but the index is 428MB
- which is a bit rough.

Removed that and put on a BRIN index, same result for 48kB - perfect!

Thanks for the help,

James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell
Hi again,

I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:

/*
 * Decide what parallel degree to request for this append path.  For
 * now, we just use the maximum parallel degree of any member.  It
 * might be useful to use a higher number if the Append node were
 * smart enough to spread out the workers, but it currently isn't.
 */

Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?

Cheers,

James Sewell,
 Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 2:39 PM, James Sewell <james.sew...@lisasoft.com>
wrote:

> Cool,
>
> I've been testing how this works with partitioning (which seems to be
> strange, but I'll post separately about that) and something odd seems to be
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# select * from a limit 1;
>  ts | count |  a  |  b   |  c   |  d   | e
> +---+-+--+--+--+---
>  2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
> (1 row)
>
> postgres-# \d a
>  Table "datamart_owner.a"
>  Column |Type | Modifiers
> +-+---
>  ts | timestamp without time zone |
>  count  | integer |
>  a  | integer |
>  b  | integer |
>  c  | integer |
>  d  | integer |
>  e  | integer |
>
> postgres=# select pg_size_pretty(pg_relation_size('a'));
>  pg_size_pretty
> 
>  1149 MB
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
>
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58
> rows=200 width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.
>
>
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> __
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>
> On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
> david.row...@2ndquadrant.com> wrote:
>
>> On 14 March 2016 at 14:52, James Sewell <james.sew...@lisasoft.com>
>> wrote:
>> > One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>>
>> There is also a comment in that function which states:
>> /*
>> * Limit the degree of parallelism logarithmically based on the size of the
>> * relation.  This probably needs to be a good deal more sophisticated,
>> but we
>> * need something here for now.
>> */
>>
>> So this will likely see some revision at some point, after 9.6.
>>
>> --
>>  David Rowley   http://www.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell
Cool,

I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;
SELECT 2000

postgres=# select * from a limit 1;
 ts | count |  a  |  b   |  c   |  d   | e
+---+-+--+--+--+---
 2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
(1 row)

postgres-# \d a
 Table "datamart_owner.a"
 Column |Type | Modifiers
+-+---
 ts | timestamp without time zone |
 count  | integer |
 a  | integer |
 b  | integer |
 c  | integer |
 d  | integer |
 e  | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));
 pg_size_pretty

 1149 MB

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
  QUERY PLAN
--
 Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=217059.08..217061.58
rows=200 width=16)
 Group Key: date_trunc('DAY'::text, ts)
 ->  Parallel Seq Scan on a  (cost=0.00..197059.06
rows=405 width=12)
(9 rows)

postgres=# analyze a;

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--
 GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
(5 rows)

Unsure what's happening here.



James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 14 March 2016 at 14:52, James Sewell <james.sew...@lisasoft.com> wrote:
> > One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.
>
> There is also a comment in that function which states:
> /*
> * Limit the degree of parallelism logarithmically based on the size of the
> * relation.  This probably needs to be a good deal more sophisticated, but
> we
> * need something here for now.
> */
>
> So this will likely see some revision at some point, after 9.6.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell
Hi,

Happy to test, really looking forward to seeing this stuff in core.

The explain analyze is below:

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
(actual time=2282.092..2282.202 rows=15 loops=1)
   Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16) (actual
time=2281.749..2282.060 rows=105 loops=1)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16) (actual time=2276.879..2277.030 rows=15 loops=7)
   Group Key: date_trunc('DAY'::text, pageview_start_tstamp)
   ->  Parallel Seq Scan on celebrus_fact_agg_1_p2015_12
 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631
.650 rows=3618887 loops=7)

One question - how is the upper limit of workers chosen?


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 12:30 PM, David Rowley <david.row...@2ndquadrant.com
> wrote:

> On 14 March 2016 at 14:16, James Sewell <james.sew...@lisasoft.com> wrote:
>
>> I've done some testing with one of my data sets in an 8VPU virtual
>> environment and this is looking really, really good.
>>
>> My test query is:
>>
>> SELECT pageview, sum(pageview_count)
>> FROM fact_agg_2015_12
>> GROUP BY date_trunc('DAY'::text, pageview);
>>
>> The query returns 15 rows. The fact_agg table is 5398MB and holds around
>> 25 million records.
>>
>> Explain with a max_parallel_degree of 8 tells me that the query will
>> only use 6 background workers. I have no indexes on the table currently.
>>
>> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>>Group Key: (date_trunc('DAY'::text, pageview))
>>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>>  Number of Workers: 6
>>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
>> width=16)
>>Group Key: date_trunc('DAY'::text, pageview)
>>->  Parallel Seq Scan on fact_agg_2015_12
>>  (cost=0.00..743769.76 rows=4221741 width=12)
>>
>
> Great! Thanks for testing this.
>
> If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
> number of Gather rows come out at 105? I'd just like to get an idea of my
> cost estimate for the Gather are going to be accurate for real world data
> sets.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell
Hi,

I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25
million records.

Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
   Group Key: (date_trunc('DAY'::text, pageview))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16)
   Group Key: date_trunc('DAY'::text, pageview)
   ->  Parallel Seq Scan on fact_agg_2015_12
 (cost=0.00..743769.76 rows=4221741 width=12)


I am getting the following timings (everything was cached before I started
tested). I didn't average the runtime, but I ran each one three times and
took the middle value.

*max_parallel_degree runtime*
0  11693.537 ms
1  6387.937 ms
2 4328.629 ms
3 3292.376 ms
4 2743.148 ms
5 2278.449 ms
6 2000.599 ms


I'm pretty happy!

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 8:44 AM, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 12 March 2016 at 16:31, David Rowley <david.row...@2ndquadrant.com>
> wrote:
> > I've attached an updated patch which is based on commit 7087166,
> > things are really changing fast in the grouping path area at the
> > moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell
Argh seems like a false alarm for now.

I installed 9.5 from RPM source (the other was one I had installed
previously) and the performance matched 9.6

Sorry about that, I must have *something* screwed up on the other one.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell
I've actually just tested this on 9.3 - and I get roughly the same as
9.6devel.

Now going back to make sure my 9.5 environment is sane.

Hopefully this isn't me jumping the gun.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Thu, Feb 25, 2016 at 10:50 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Thu, Feb 25, 2016 at 12:26 PM, Bruce Momjian <br...@momjian.us> wrote:
> > On Thu, Feb 25, 2016 at 10:06:34AM +1100, James Sewell wrote:
> >> Now when I run the following SQL (multiple times to allow for getting
> >> everything into shared buffers, which is 4GB on my machine):
> >>
> >>
> >> select sum(count_n) from base group by view_time_day;
> >>
> >>
> >> I get the following results:
> >>
> >>
> >> PSQL 9.5 - ~21 seconds
> >> PSQL 9.6 devel - ~8.5 seconds
> >>
> >>
> >> I think that's pretty good!
> >>
> >> I know this is a devel release, things may change, blah blah. But still,
> >> something has changed for the better here!
> >
> > Wow, that is cool.  Can anyone suggest which commit improved this?
>
> Since it sums numerics, maybe integer transition functions from commit
> 959277a4f579da5243968c750069570a58e92b38 helped?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


[HACKERS] Random note of encouragement

2016-02-24 Thread James Sewell
Hey All,

I've been doing some (futile) work trying to speed up aggregates with a
group by in PostgreSQL 9.5.

I installed PostgreSQL 9.6 on the same machine to see if I could get
anything running in parallel when using partitioning - which didn't work.

But - I did find this:

With the following setup:

CREATE TABLE base(
view_time TIMESTAMP WITHOUT time ZONE,
view_time_day TIMESTAMP WITHOUT time ZONE,
count_n numeric);


INSERT INTO base
SELECT view_time,
   date_trunc('day', view_time),
   COUNT::numeric,
FROM
  (SELECT
  TIMESTAMP '2015-12-01' + random() * interval '30 days' AS view_time,
trunc(random() * 99 + 1) AS COUNT
   FROM generate_series(1,3000)) a;

analyze base;

Now when I run the following SQL (multiple times to allow for getting
everything into shared buffers, which is 4GB on my machine):

select sum(count_n) from base group by view_time_day;


I get the following results:

PSQL 9.5 - *~21 seconds*
PSQL 9.6 devel - *~8.5 seconds*


I think that's pretty good!

I know this is a devel release, things may change, blah blah. But still,
something has changed for the better here!

I get the same plan on both nodes:

 HashAggregate  (cost=670590.56..670590.95 rows=31 width=13)
   Group Key: view_time_day
   ->  Seq Scan on base  (cost=0.00..520590.04 rows=3104 width=13)

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] WAL Re-Writes

2016-01-27 Thread james

On 27/01/2016 13:30, Amit Kapila wrote:


Thoughts?


Are the decreases observed with SSD as well as spinning rust?

I might imagine that decreasing the wear would be advantageous, 
especially if the performance decrease is less with low read latency.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming Replication clusters and load balancing

2015-09-21 Thread James Sewell
Hello all,

I have recently been working with PostgreSQL and HAProxy to provide
seamless load balancing to a group of database servers. This on it's own
isn't a hard thing: I have an implementation finished and am now thinking
about the best way to bring it to a production ready state which could be
used by others, and used in load-balancers other than HAProxy with minimal
config changes.

My initial requirements were:

Given a group of PostgreSQL servers check each x seconds and:

   - Allow read/write access only to the master server (via IPA / portA)
  - Disallow access if there are multiple master servers




   - Allow read access to all servers (via IPB / portB) as long as the
   following holds:
  - They are attached to the current master server via streaming
  replication (or they are the current master server)
  - They can currently contact the master server (safest option,
  disallow all access when master-less)
  - They are in the same timeline as the master server (do I need this
  check?)
  - The master server reports that they have less than x bytes lag



HAProxy can talk to PostgreSQL for a health check via TCP or PSQL
(connection check only). Neither of these allow the logic above - therefore
this logic has to be hosted outside of HAProxy. This might change in the
future if HAProxy gets the ability to send SQL statements (like an F5 can).

Today the best way to provide this information to  HAProxy (and many other
load balancers, application frameworks, proxies, monitoring solutions) is
via HTTP, where HTTP 200 is pass the check and HTTP 503 is fail the check
(and don't load balance to this node). In my case I have a script which
accepts HTTP requests to /read to check if this node is available for read
only and /write which checks if this node is available for read/writes.

The options as I see them are:

   - Implement a script / small app which connects to PostgreSQL and
   executes these checks
  - Implemented and proven today at many independent sites
  - Should it run on HAProxy server or PSQL server?
  - Integrated HTTP server  or x.inetd script?
  - Platform independence?
  - What if it dies?


   - Implement a custom PostgreSQL BGworker which provides this information
   over HTTP
  - No outside of PostgreSQL config needed
  - No reliance on another daemon / interface being up
  - libmicrohttpd or similar should help with  platform independence
  - Security / acceptance by community?
  - Only newer versions of PostgreSQL


   - Spend the time working on getting SQL checks into HAProxy
  - What about other platforms which only support HTTP?

I think all of the options would benefit from a PSQL extension which does
the following:

   - Encapsulates the check logic (easier to upgrade, manipulate)
   - Stores historic check data for a number of hours / days / months
   - Stores defaults (override via HTTP could be possible for things like
   lag)

Does anyone else have any thoughts on this topic?

Eventually many cool features could flow out of this kind of work:

   - Integration with High Availability products - I have this working with
   EnterpriseDB EFM now.
  - Locate the current master using the HA product
 - more than one master doesn't cause loss of service as long as HA
 state is sane
  - Locate all clustered standby servers using the HA product
 - if a standby is removed from the HA cluster, it is removed from
 load balancing
 - if a standby is not part of the cluster, it is removed from load
 balancing (even if it is part of streaming replication)


   - HTTP replication status requests which facilitate dynamically managing
   HAProxy (or other) PostgreSQL server pools
  - Add a node to streaming replication, it automatically shows up in
  the pool and starts being checked to see if it can service reads
  - Great for cloud scale out


   - Allocation of additional load balancer groups based on some criteria
   (?), for example
  - read/write (as above)
  - read only (as above)
  - data warehouse (reporting reads only)
  - DR (replica with no reads or writes - until it becomes a master)

Keen to hear comments.

Cheers,

James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread James Cloos
 MP == Michael Paquier michael.paqu...@gmail.com writes:

MP So, attached is a patch that does 1) and 2) to make clear to the
MP user how numeric and double precision behave regarding rounding.
MP I am adding it to CF 2015-06 to keep track of it...

Given that the examples show -2.5 rounds to -3, the IEEE term is
roundTiesToAway, and the typical conversational english is round ties
away from zero.

RoundUp means mean towards +Infinity.

754 specifies that for decimal, either roundTiesToEven or roundTiesToAway
are acceptable defaults, and which of the two applies is language dependent.
Does ANSI SQL say anything about how numeric should round?

In general, for decimals (or anything other than binary), there are
twelve possible roundings:

 ToEven ToOdd AwayFromZero ToZero Up Down
 TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown

(Up is the same as ceil(3), Down as floor(3).)

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Composite index and min()

2015-02-25 Thread James Sewell
Hello,

I have the following table:

 \d a
 Table phxconfig.a
Column |  Type   | Modifiers
---+-+---
 phx_run_id| integer |
 cell_id   | integer |
Indexes:
a_phx_run_id_cell_id_idx btree (phx_run_id, cell_id)

When I use a min() query I get the following plans:

test=# explain select min(phx_run_id) from a;
  QUERY PLAN
---
 Result  (cost=0.22..0.23 rows=1 width=0)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.14..0.22 rows=1 width=4)
   -  Index Only Scan using a_phx_run_id_cell_id_idx on a
 (cost=0.14..7.89 rows=100 width=4)
 Index Cond: (phx_run_id IS NOT NULL)

test=# explain select min(cell_id) from a;
   QUERY PLAN
-
 Aggregate  (cost=2.25..2.26 rows=1 width=4)
   -  Seq Scan on a  (cost=0.00..2.00 rows=100 width=4)

Can anyone comment on why this happens?

The index  kicks in when I do an explicit cell_id comparison. These are
large tables, and they are in a partition layout so it really hurts when I
do the min call on the parent table.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


[HACKERS] ADD FOREIGN KEY locking

2015-02-17 Thread James Sewell
Hello all,

When I add a FK with a statement like this:

ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);

I see a lock on table b:

select locktype,mode,granted from pg_locks, pg_stat_activity where
relation::regclass::text = 'b' AND pg_locks.pid = pg_stat_activity.pid;

locktype | relation
mode | AccessShareLock
granted  | t
query | SOME LONG RUNNING QUERY WHICH SELECTS FROM b

locktype | relation
mode | AccessExclusiveLock
granted  | f
query | ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);


This means that my add key won't complete until my long running query does.
That seems a bit odd to me? In this database there are lots of
datawarehouse type queries running, which makes it a bit hard for me to
schedule this operation.

Is this just a manifestation of adding the key being in an ALTER TABLE,
which always needs an AccessExclusiveLock? Or am I missing some edge case
when this lock would be required in this circumstance?

No real urgency on this question, I just found it a bit strange and thought
someone might be able to shed some light.

James Sewell,
Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] ADD FOREIGN KEY locking

2015-02-17 Thread James Sewell
Oh,

I've just noticed something in the Commit fest post

- Reducing lock strength of trigger and foreign key DDL

Perhaps I just need to be more patient.

Cheers,


James Sewell,
 Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Wed, Feb 18, 2015 at 10:57 AM, James Sewell james.sew...@lisasoft.com
wrote:

 Hello all,

 When I add a FK with a statement like this:

 ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);

 I see a lock on table b:

 select locktype,mode,granted from pg_locks, pg_stat_activity where
 relation::regclass::text = 'b' AND pg_locks.pid = pg_stat_activity.pid;

 locktype | relation
 mode | AccessShareLock
 granted  | t
 query | SOME LONG RUNNING QUERY WHICH SELECTS FROM b

 locktype | relation
 mode | AccessExclusiveLock
 granted  | f
 query | ALTER TABLE a ADD FOREIGN KEY (id) REFERENCES b(id);


 This means that my add key won't complete until my long running query
 does. That seems a bit odd to me? In this database there are lots of
 datawarehouse type queries running, which makes it a bit hard for me to
 schedule this operation.

 Is this just a manifestation of adding the key being in an ALTER TABLE,
 which always needs an AccessExclusiveLock? Or am I missing some edge case
 when this lock would be required in this circumstance?

 No real urgency on this question, I just found it a bit strange and
 thought someone might be able to shed some light.

 James Sewell,
 Solutions Architect
 __


  Level 2, 50 Queen St, Melbourne VIC 3000

 *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-03 Thread james
On 03/10/2014 05:53, Kouhei Kaigai wrote:
 Yep, that's my pain. Even though usual query does not take many buffers 
 pinned,
 my use case needs to fetch megabytes scale data at once because of performance
 reason; page-by-page synchronous scan makes GPU being idle.
Doesn't your GPU have an async queue and exec mechanism? Then you could
do an asyn
DMA to the GPU with an event, use that event in he GPU to start the
kernel and in the
DB to release the pin?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-06 Thread James Cloos
 ST == Shaun Thomas stho...@optionshouse.com writes:

ST That said, the documentation here says FLOAT4 is an alias for REAL,
ST so it's somewhat nonintuitive for FLOAT4 to be so much slower than
ST FLOAT8, which is an alias for DOUBLE PRECISION.

There are some versions of glibc where doing certain math on double is
faster than doing it on float, depending on how things are compiled.

Maybe this is one of them?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-03-02 Thread james

On 02/03/2014 15:30, Magnus Hagander wrote:
Terminal Services have definitely become more common over time, but 
with faster and cheaper virtualization, a lot of people have switched 
to that instead, which would remove the problem of course.


I wonder how common it actually is, though, to *build postgres* on a 
terminal services machine with other users on it...


Well, the banks I've contracted at recently are all rather keen on 
virtual desktops for developers, and some of those are terminal 
services.  We're a headache, and packaging up all the things we need is 
a pain, so there is some mileage in buying grunty servers and doing 
specific installs that are then shared, rather than making an MSI 
generally available.


Also I have experience of being given accounts for jenkins etc that are 
essentially terminal services logins, and having these things unable to 
maintain a software stack can effectively disqualify tech we would 
otherwise use.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell
I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

   1. Stop Node A
   2. Promote Node B
   3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?


Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Wed, Feb 5, 2014 at 6:03 PM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Wed, Feb 5, 2014 at 10:30 AM, James Sewell james.sew...@lisasoft.com
 
  I've seen some proposals and a tool (pg_rewind), but all seem to have
 draw
  backs.
 
  As far as I remember, one of the main drawbacks for pg_rewind was
 related to
  hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
  core
  PostgreSQL code, however if you wish, you can try that tool to see if
 can it
  solve your purpose.
 For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4,
 yes wal_log_hints or checksums is mandatory. The code contains as well
 some safety checks as well to ensure that a node not using those
 parameters cannot be rewinded.
 Regards,
 --
 Michael


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell
Node A could get ahead even if it has been shut down cleanly BEFORE the
promotion?

I'd always assumed if I shut down the master the slave would be at the same
point after shutdown - is this incorrect?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Feb 7, 2014 at 4:58 PM, Michael Paquier
michael.paqu...@gmail.comwrote:




 On Fri, Feb 7, 2014 at 1:57 PM, James Sewell james.sew...@lisasoft.comwrote:

 I've just noticed that on PostgreSQL 9.3 I can do the following with a
 master node A and a slave node B (as long as I have set
 recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

 This is sufficient for my needs (I know it doesn't cover a crash), can
 anyone see any potential problems with this approach?

 Yes, node A could get ahead of the point where WAL forked when promoting
 B. In this case you cannot reconnect A to B, and need to actually recreate
 a node from a fresh base backup, or rewind it. pg_rewind targets the
 latter, postgres core is able to to the former, and depending on things
 like your environment and/or the size of your server, you might prefer one
 or the other.
 Regards,
 --
 Michael


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


[HACKERS] PostgreSQL Failback without rebuild

2014-02-04 Thread James Sewell
Hello All,

I have been reading through some of the recent discussions about failback
when in a streaming replication setup. I define failback as:


   1. Node A is master, Node B is slave
   2. Node A crashes || Node A is stopped || nothing happens
   3. Promote Node B to Master
   4. Attach Node A as slave

My understanding is currently to achieve step three you need to take a base
backup of Node B and deploy it to Node A before starting streaming
replication (or use rsync etc...).

This is very undesirable for many users, especially if they have a very
large database.

From the discussions I can see that the problem is to do with Node A
writing changes to disk that Node B are not streamed before Node A crashes.

Has there been any consensus on this issue? Are there any solutions which
might make it into 9.4 or 9.5? I've seen some proposals and a tool
(pg_rewind), but all seem to have draw backs.

I've been looking mainly at these threads:

http://www.postgresql.org/message-id/CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=ggyu1kmt+s2xcq...@mail.gmail.com

http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com

http://www.postgresql.org/message-id/519df910.4020...@vmware.com

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote:
 On Jan 13, 2014, at 19:03, Hannu Krosing ha...@2ndquadrant.com wrote:
 
  On 01/13/2014 09:53 PM, Trond Myklebust wrote:
  On Jan 13, 2014, at 15:40, Andres Freund and...@2ndquadrant.com wrote:
  
  On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
  On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com 
  wrote:
  I notice, Josh, that you didn't mention the problems many people
  have run into with Transparent Huge Page defrag and with NUMA
  access.
  Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
  setting zone_reclaim_mode; is there some other problem besides that?
  I think that fixes some of the worst instances, but I've seen machines
  spending horrible amounts of CPU ( BUS) time in page reclaim
  nonetheless. If I analyzed it correctly it's in RAM  working set
  workloads where RAM is pretty large and most of it is used as page
  cache. The kernel ends up spending a huge percentage of time finding and
  potentially defragmenting pages when looking for victim buffers.
  
  On a related note, there's also the problem of double-buffering.  When
  we read a page into shared_buffers, we leave a copy behind in the OS
  buffers, and similarly on write-out.  It's very unclear what to do
  about this, since the kernel and PostgreSQL don't have intimate
  knowledge of what each other are doing, but it would be nice to solve
  somehow.
  I've wondered before if there wouldn't be a chance for postgres to say
  my dear OS, that the file range 0-8192 of file x contains y, no need to
  reread and do that when we evict a page from s_b but I never dared to
  actually propose that to kernel people...
  O_DIRECT was specifically designed to solve the problem of double 
  buffering 
  between applications and the kernel. Why are you not able to use that in 
  these situations?
  What is asked is the opposite of O_DIRECT - the write from a buffer inside
  postgresql to linux *buffercache* and telling linux that it is the same
  as what
  is currently on disk, so don't bother to write it back ever.
 
 I don’t understand. Are we talking about mmap()ed files here? Why
 would the kernel be trying to write back pages that aren’t dirty?

No ... if I have it right, it's pretty awful: they want to do a read of
a file into a user provided buffer, thus obtaining a page cache entry
and a copy in their userspace buffer, then insert the page of the user
buffer back into the page cache as the page cache page ... that's right,
isn't it postgress people?

Effectively you end up with buffered read/write that's also mapped into
the page cache.  It's a pretty awful way to hack around mmap.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 15:39 +0100, Hannu Krosing wrote:
 On 01/14/2014 09:39 AM, Claudio Freire wrote:
  On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com 
  wrote:
  Again, as said above the linux file system is doing fine. What we
  want is a few ways to interact with it to let it do even better when
  working with postgresql by telling it some stuff it otherwise would
  have to second guess and by sometimes giving it back some cache
  pages which were copied away for potential modifying but ended
  up clean in the end.
  You don't need new interfaces. Only a slight modification of what
  fadvise DONTNEED does.
 
  This insistence in injecting pages from postgres to kernel is just a
  bad idea. 
 Do you think it would be possible to map copy-on-write pages
 from linux cache to postgresql cache ?
 
 this would be a step in direction of solving the double-ram-usage
 of pages which have not been read from syscache to postgresql
 cache without sacrificing linux read-ahead (which I assume does
 not happen when reads bypass system cache).

The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the same
page in both currently.

glibc used to have an implementation of read/write in terms of mmap, so
it should be possible to insert it into your current implementation
without a major rewrite.  The problem I think this brings you is
uncontrolled writeback: you don't want dirty pages to go to disk until
you issue a write()  I think we could fix this with another madvise():
something like MADV_WILLUPDATE telling the page cache we expect to alter
the pages again, so don't be aggressive about cleaning them.  Plus all
the other issues with mmap() ... but if you can detail those, we might
be able to fix them.

 and we can write back the copy at the point when it is safe (from
 postgresql perspective)  to let the system write them back ?

Using MADV_WILLUPDATE, possibly ... you're still not going to have
absolute control.  The kernel will write back the pages if the dirty
limits are exceeded, for instance, but we could tune it to be useful.

 Do you think it is possible to make it work with good performance
 for a few million 8kb pages ?
 
  At the very least, it still needs postgres to know too much
  of the filesystem (block layout) to properly work. Ie: pg must be
  required to put entire filesystem-level blocks into the page cache,
  since that's how the page cache works. 
 I was more thinking of an simple write() interface with extra
 flags/sysctls to tell kernel that we already have this on disk
  At the very worst, it may
  introduce serious security and reliability implications, when
  applications can destroy the consistency of the page cache (even if
  full access rights are checked, there's still the possibility this
  inconsistency might be exploitable).
 If you allow write() which just writes clean pages, I can not see
 where the extra security concerns are beyond what normal
 write can do.

The problem is we can't give you absolute control of when pages are
written back because that interface can be used to DoS the system: once
we get too many dirty uncleanable pages, we'll thrash looking for memory
and the system will livelock.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote:
 On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  No, I'm sorry, that's never going to be possible.  No user space
  application has all the facts.  If we give you an interface to force
  unconditional holding of dirty pages in core you'll livelock the system
  eventually because you made a wrong decision to hold too many dirty
  pages.   I don't understand why this has to be absolute: if you advise
  us to hold the pages dirty and we do up until it becomes a choice to
  hold on to the pages or to thrash the system into a livelock, why would
  you ever choose the latter?  And if, as I'm assuming, you never would,
  why don't you want the kernel to make that choice for you?
 
 If you don't understand how write-ahead logging works, this
 conversation is going nowhere.  Suffice it to say that the word
 ahead is not optional.

No, I do ... you mean the order of write out, if we have to do it, is
important.  In the rest of the kernel, we do this with barriers which
causes ordered grouping of I/O chunks.  If we could force a similar
ordering in the writeout code, is that enough?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
 On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote:
 
  In terms of avoiding double-buffering, here's my thought after reading
  what's been written so far.  Suppose we read a page into our buffer
  pool.  Until the page is clean, it would be ideal for the mapping to
  be shared between the buffer cache and our pool, sort of like
  copy-on-write.  That way, if we decide to evict the page, it will
  still be in the OS cache if we end up needing it again (remember, the
  OS cache is typically much larger than our buffer pool).  But if the
  page is dirtied, then instead of copying it, just have the buffer pool
  forget about it, because at that point we know we're going to write
  the page back out anyway before evicting it.
 
  This would be pretty similar to copy-on-write, except without the
  copying.  It would just be forget-from-the-buffer-pool-on-write.
 
 
 But... either copy-on-write or forget-on-write needs a page fault, and
 thus a page mapping.
 
 Is a page fault more expensive than copying 8k?
 
 (I really don't know).

A page fault can be expensive, yes ... but perhaps you don't need one. 

What you want is a range of memory that's read from a file but treated
as anonymous for writeout (i.e. written to swap if we need to reclaim
it).  Then at some time later, you want to designate it as written back
to the file instead so you control the writeout order.  I'm not sure we
can do this: the separation between file backed and anonymous pages is
pretty deeply ingrained into the OS, but if it were possible, is that
what you want?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote:
 James Bottomley james.bottom...@hansenpartnership.com writes:
  The current mechanism for coherency between a userspace cache and the
  in-kernel page cache is mmap ... that's the only way you get the same
  page in both currently.
 
 Right.
 
  glibc used to have an implementation of read/write in terms of mmap, so
  it should be possible to insert it into your current implementation
  without a major rewrite.  The problem I think this brings you is
  uncontrolled writeback: you don't want dirty pages to go to disk until
  you issue a write()
 
 Exactly.
 
  I think we could fix this with another madvise():
  something like MADV_WILLUPDATE telling the page cache we expect to alter
  the pages again, so don't be aggressive about cleaning them.
 
 Don't be aggressive isn't good enough.  The prohibition on early write
 has to be absolute, because writing a dirty page before we've done
 whatever else we need to do results in a corrupt database.  It has to
 be treated like a write barrier.
 
  The problem is we can't give you absolute control of when pages are
  written back because that interface can be used to DoS the system: once
  we get too many dirty uncleanable pages, we'll thrash looking for memory
  and the system will livelock.
 
 Understood, but that makes this direction a dead end.  We can't use
 it if the kernel might decide to write anyway.

No, I'm sorry, that's never going to be possible.  No user space
application has all the facts.  If we give you an interface to force
unconditional holding of dirty pages in core you'll livelock the system
eventually because you made a wrong decision to hold too many dirty
pages.   I don't understand why this has to be absolute: if you advise
us to hold the pages dirty and we do up until it becomes a choice to
hold on to the pages or to thrash the system into a livelock, why would
you ever choose the latter?  And if, as I'm assuming, you never would,
why don't you want the kernel to make that choice for you?

James



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 12:39 -0500, Robert Haas wrote:
 On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
  On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote:
   In terms of avoiding double-buffering, here's my thought after reading
   what's been written so far.  Suppose we read a page into our buffer
   pool.  Until the page is clean, it would be ideal for the mapping to
   be shared between the buffer cache and our pool, sort of like
   copy-on-write.  That way, if we decide to evict the page, it will
   still be in the OS cache if we end up needing it again (remember, the
   OS cache is typically much larger than our buffer pool).  But if the
   page is dirtied, then instead of copying it, just have the buffer pool
   forget about it, because at that point we know we're going to write
   the page back out anyway before evicting it.
  
   This would be pretty similar to copy-on-write, except without the
   copying.  It would just be forget-from-the-buffer-pool-on-write.
 
  But... either copy-on-write or forget-on-write needs a page fault, and
  thus a page mapping.
 
  Is a page fault more expensive than copying 8k?
 
  (I really don't know).
 
  A page fault can be expensive, yes ... but perhaps you don't need one.
 
  What you want is a range of memory that's read from a file but treated
  as anonymous for writeout (i.e. written to swap if we need to reclaim
  it). Then at some time later, you want to designate it as written back
  to the file instead so you control the writeout order.  I'm not sure we
  can do this: the separation between file backed and anonymous pages is
  pretty deeply ingrained into the OS, but if it were possible, is that
  what you want?
 
 Doesn't sound exactly like what I had in mind.  What I was suggesting
 is an analogue of read() that, if it reads full pages of data to a
 page-aligned address, shares the data with the buffer cache until it's
 first written instead of actually copying the data.

The only way to make this happen is mmap the file to the buffer and use
MADV_WILLNEED.

   The pages are
 write-protected so that an attempt to write the address range causes a
 page fault.  In response to such a fault, the pages become anonymous
 memory and the buffer cache no longer holds a reference to the page.

OK, so here I thought of another madvise() call to switch the region to
anonymous memory.  A page fault works too, of course, it's just that one
per page in the mapping will be expensive.

Do you care about handling aliases ... what happens if someone else
reads from the file, or will that never occur?  The reason for asking is
that it's much easier if someone else mmapping the file gets your
anonymous memory than we create an alias in the page cache.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 15:09 -0500, Robert Haas wrote:
 On Tue, Jan 14, 2014 at 3:00 PM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  Doesn't sound exactly like what I had in mind.  What I was suggesting
  is an analogue of read() that, if it reads full pages of data to a
  page-aligned address, shares the data with the buffer cache until it's
  first written instead of actually copying the data.
 
  The only way to make this happen is mmap the file to the buffer and use
  MADV_WILLNEED.
 
The pages are
  write-protected so that an attempt to write the address range causes a
  page fault.  In response to such a fault, the pages become anonymous
  memory and the buffer cache no longer holds a reference to the page.
 
  OK, so here I thought of another madvise() call to switch the region to
  anonymous memory.  A page fault works too, of course, it's just that one
  per page in the mapping will be expensive.
 
 I don't think either of these ideas works for us.  We start by
 creating a chunk of shared memory that all processes (we do not use
 threads) will have mapped at a common address, and we read() and
 write() into that chunk.

Yes, that's what I was thinking: it's a cache.  About how many files
comprise this cache?  Are you thinking it's too difficult for every
process to map the files?

  Do you care about handling aliases ... what happens if someone else
  reads from the file, or will that never occur?  The reason for asking is
  that it's much easier if someone else mmapping the file gets your
  anonymous memory than we create an alias in the page cache.
 
 All reads and writes go through the buffer pool stored in shared
 memory, but any of the processes that have that shared memory region
 mapped could be responsible for any individual I/O request.

That seems to be possible with the abstraction.  The initial mapping
gets the file backed pages: you can do madvise to read them (using
readahead), flush them (using wontneed) and flip them to anonymous
(using something TBD).  Since it's a shared mapping API based on the
file, any of the mapping processes can do any operation.  Future mappers
of the file get the mix of real and anon memory, so it's truly shared.

Given that you want to use this as a shared cache, it seems that the API
to flip back from anon to file mapped is wontneed.  That would also
trigger writeback of any dirty pages in the previously anon region ...
which you could force with msync.  As far as I can see, this is
identical to read/write on a shared region with the exception that you
don't need to copy in and out of the page cache.

From our point of view, the implementation is nice because the pages
effectively never leave the page cache.  We just use an extra per page
flag (which I'll get shot for suggesting) to alter the writeout path
(which is where the complexity which may kill the implementation is).

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley
On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
 On 1/13/14, 2:27 PM, Claudio Freire wrote:
  On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote:
  On 1/13/14, 2:19 PM, Claudio Freire wrote:
 
  On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com
  wrote:
 
  On a related note, there's also the problem of double-buffering.  When
  we read a page into shared_buffers, we leave a copy behind in the OS
  buffers, and similarly on write-out.  It's very unclear what to do
  about this, since the kernel and PostgreSQL don't have intimate
  knowledge of what each other are doing, but it would be nice to solve
  somehow.
 
 
 
  There you have a much harder algorithmic problem.
 
  You can basically control duplication with fadvise and WONTNEED. The
  problem here is not the kernel and whether or not it allows postgres
  to be smart about it. The problem is... what kind of smarts
  (algorithm) to use.
 
 
  Isn't this a fairly simple matter of when we read a page into shared 
  buffers
  tell the kernel do forget that page? And a corollary to that for when we
  dump a page out of shared_buffers (here kernel, please put this back into
  your cache).
 
 
  That's my point. In terms of kernel-postgres interaction, it's fairly 
  simple.
 
  What's not so simple, is figuring out what policy to use. Remember,
  you cannot tell the kernel to put some page in its page cache without
  reading it or writing it. So, once you make the kernel forget a page,
  evicting it from shared buffers becomes quite expensive.
 
 Well, if we were to collaborate with the kernel community on this then
 presumably we can do better than that for eviction... even to the
 extent of here's some data from this range in this file. It's (clean|
 dirty). Put it in your cache. Just trust me on this.

This should be the madvise() interface (with MADV_WILLNEED and
MADV_DONTNEED) is there something in that interface that is
insufficient?

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley
On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote:
 On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
  On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
   Well, if we were to collaborate with the kernel community on this then
   presumably we can do better than that for eviction... even to the
   extent of here's some data from this range in this file. It's (clean|
   dirty). Put it in your cache. Just trust me on this.
  
  This should be the madvise() interface (with MADV_WILLNEED and
  MADV_DONTNEED) is there something in that interface that is
  insufficient?
 
 For one, postgres doesn't use mmap for files (and can't without major
 new interfaces).

I understand, that's why you get double buffering: because we can't
replace a page in the range you give us on read/write.  However, you
don't have to switch entirely to mmap: you can use mmap/madvise
exclusively for cache control and still use read/write (and still pay
the double buffer penalty, of course).  It's only read/write with
directio that would cause problems here (unless you're planning to
switch to DIO?).

  Frequently mmap()/madvise()/munmap()ing 8kb chunks has
 horrible consequences for performance/scalability - very quickly you
 contend on locks in the kernel.

Is this because of problems in the mmap_sem?

 Also, that will mark that page dirty, which isn't what we want in this
 case.

You mean madvise (page_addr)?  It shouldn't ... the state of the dirty
bit should only be updated by actual writes.  Which MADV_ primitive is
causing the dirty marking, because we might be able to fix it (unless
there's some weird corner case I don't know about).

  One major usecase is transplanting a page comming from postgres'
 buffers into the kernel's buffercache because the latter has a much
 better chance of properly allocating system resources across independent
 applications running.

If you want to share pages between the application and the page cache,
the only known interface is mmap ... perhaps we can discuss how better
to improve mmap for you?

We also do have a way of transplanting pages: it's called splice.  How
do the semantics of splice differ from what you need?

 Oh, and the kernel's page-cache management while far from perfect,
 actually scales much better than postgres'.

Well, then, it sounds like the best way forward would be to get
postgress to use the kernel page cache more efficiently.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance

2014-01-13 Thread James Bottomley
On Mon, 2014-01-13 at 21:29 +, Greg Stark wrote:
 On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com wrote:
  For one, postgres doesn't use mmap for files (and can't without major
  new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
  horrible consequences for performance/scalability - very quickly you
  contend on locks in the kernel.
 
 
 I may as well dump this in this thread. We've discussed this in person
 a few times, including at least once with Ted T'so when he visited
 Dublin last year.
 
 The fundamental conflict is that the kernel understands better the
 hardware and other software using the same resources, Postgres
 understands better its own access patterns. We need to either add
 interfaces so Postgres can teach the kernel what it needs about its
 access patterns or add interfaces so Postgres can find out what it
 needs to know about the hardware context.
 
 The more ambitious and interesting direction is to let Postgres tell
 the kernel what it needs to know to manage everything. To do that we
 would need the ability to control when pages are flushed out. This is
 absolutely necessary to maintain consistency. Postgres would need to
 be able to mark pages as unflushable until some point in time in the
 future when the journal is flushed. We discussed various ways that
 interface could work but it would be tricky to keep it low enough
 overhead to be workable.

So in this case, the question would be what additional information do we
need to exchange that's not covered by the existing interfaces.  Between
madvise and splice, we seem to have most of what you want; what's
missing?

 The less exciting, more conservative option would be to add kernel
 interfaces to teach Postgres about things like raid geometries. Then
 Postgres could use directio and decide to do prefetching based on the
 raid geometry, how much available i/o bandwidth and iops is available,
 etc.
 
 Reimplementing i/o schedulers and all the rest of the work that the
 kernel provides inside Postgres just seems like something outside our
 competency and that none of us is really excited about doing.

This would also be a well trodden path ... I believe that some large
database company introduced Direct IO for roughly this purpose.

James




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james

On 06/01/2014 03:14, Robert Haas wrote:

That's up to the application.  After calling dsm_create(), you call
dsm_segment_handle() to get the 32-bit integer handle for that
segment.  Then you have to get that to the other process(es) somehow.
If you're trying to share a handle with a background worker, you can
stuff it in bgw_main_arg.  Otherwise, you'll probably need to store it
in the main shared memory segment, or a file, or whatever.
Well, that works for sysv shm, sure.  But I was interested (possibly 
from Konstantin)
how the handle transfer takes place at the moment, particularly if it is 
possible
to create additional segments dynamically.  I haven't looked at the 
extension at all.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james

On 06/01/2014 04:20, Amit Kapila wrote:

Duplicate handle should work, but we need to communicate the handle
to other process using IPC.
Only if the other process needs to use it.  The IPC is not to transfer 
the handle to
the other process, just to tell it which slot in its handle table 
contains the handle.
If you just want to ensure that its use-count never goes to zero, the 
receiver does

not need to know what the handle is.

However ...

The point remains that you need to duplicate it into every process that 
might

want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that  handle value publicly so that other 
child

processes can DuplicateHandle it back into their own process.

The handle value can change so you also need to refer to the handle in the
parent and map it in each child to the local equivalent.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james

On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are*automatically*
destroyed*by the operating system*  when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

Surely you just need to DuplicateHandle into the parent process?  If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.



Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james

On 05/01/2014 18:02, Robert Haas wrote:

On Sun, Jan 5, 2014 at 12:34 PM, jamesja...@mansionfamily.plus.com  wrote:

On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are*automatically*
destroyed*by the operating system*  when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

Surely you just need to DuplicateHandle into the parent process?  If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.

Uh, I don't know, maybe?  Does the postmaster have to do something to
receive the duplicated handle


In principle, no, so long as the child has a handle to the parent 
process that has

the appropriate permissions.  Given that these processes have a parent/child
relationship that shouldn't be too hard to arrange.

, or can the child just throw it over the
wall to the parent and let it rot until the postmaster finally exits?

Yes.  Though it might be a good idea to record the handle somewhere (perhaps
in a table) so that any potential issues from an insane system spamming 
the postmaster

with handles are apparent.

I'm intrigued - how are the handles shared between children that are peers
in the current scheme?  Some handle transfer must already be in place.

Could you share the handles to an immortal worker if you want to reduce any
potential impact on the postmaster?

The latter would be nicer for our purposes, perhaps, as running more
code from within the postmaster is risky for us.  If a regular backend
process dies, the postmaster will restart everything and the database
will come back on line, but if the postmaster itself dies, we're hard
down.

-- Robert Haas EnterpriseDB: http://www.enterprisedb.com The 
Enterprise PostgreSQL Company




Re: [HACKERS] SSL: better default ciphersuite

2013-12-15 Thread James Cloos
 MK == Marko Kreen mark...@gmail.com writes:
 PE == Peter Eisentraut pete...@gmx.net writes:

MK Well, we should - the DEFAULT is clearly a client-side default
MK for compatibility only.  No server should ever run with it.

PE Any other opinions on this out there?

For reference, see:

  https://wiki.mozilla.org/Security/Server_Side_TLS

for the currently suggested suite for TLS servers.

That is:

ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:
ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:
DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:
ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:
ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:
ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:
DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:
DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:
AES256-GCM-SHA384:ECDHE-RSA-RC4-SHA:ECDHE-ECDSA-RC4-SHA:
AES128:AES256:RC4-SHA:HIGH:
!aNULL:!eNULL:!EXPORT:!DES:!3DES:!MD5:!PSK

The page explains why.

But for pgsql, I'd leave off the !PSK; pre-shared keys may prove useful
for some.  And RC4, perhaps, also should be !ed.

And if anyone wants Kerberos tls-authentication, one could add
KRB5-DES-CBC3-SHA, but that is ssl3-only.

Once salsa20-poly1305 lands in openssl, that should be added to the
start of the list.

-JimC
--
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-10-16 Thread James Sewell
Hey All,

I had missed these emails, sorry.

The search+bind mode issue is one of documentation location, I have fixed
it by moving the section to the applied to both list. As the patch is to do
with post-auth response this is correct.

As far as the issue when something other than 0 or 1 is set I am happy
throw an error (although this doesn't seem to be how option such as LDAPTLS
work: 1 if 1 else 0). I assume I would use the ereport() function to do
this (using the second example from this page
http://www.postgresql.org/docs/9.2/static/error-message-reporting.html)?

Cheers,
James


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Thu, Sep 19, 2013 at 12:56 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 7/8/13 9:33 PM, James Sewell wrote:
  New patch attached. I've moved from using a boolean to an enum trivalue.

 When ldapreferrals is set to something other than 0 or 1 exactly, it
 just ignores the option.  That's not good, I think.  It should probably
 be an error.



-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] PSQL return coder

2013-10-15 Thread James Sewell
I was avoiding ON_ERROR_STOP because I was using ON_ERROR_ROLLBACK, but
have just realised that if I encase my SQL in a transaction then rollback
will still happen.

Perfect!


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Oct 11, 2013 at 12:25 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  James Sewell james.sew...@lisasoft.com writes:
  My question is in a rollback scenario is it possible to get PSQL to
 return
  a non 0 exit status?
 
  Maybe you could use -c instead of -f?
 
  $ psql -c 'select 1; select 1/0' regression
  ERROR:  division by zero
  $ echo $?
  1
 
  You won't need explicit BEGIN/END because this is already a single
  transaction.

 According to the man page,
 EXIT STATUS
psql returns 0 to the shell if it finished normally, 1 if a fatal
 error
of its own (out of memory, file not found) occurs, 2 if the
  connection
to the server went bad and the session was not interactive, and 3
 if an
error occurred in a script and the variable ON_ERROR_STOP was set.

 So for a longer script ON_ERROR_STOP might be the ticket (which is
 usually a good idea anyways).

 merlin


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


[HACKERS] PSQL return coder

2013-10-09 Thread James Sewell
Hello,

I am using PSQL to run SQL from a file with the -f flag as follows:

BEGIN
SQL
SQL
...
END

This gives me rollback on error and a nicer output than -1. This works fine.

My question is in a rollback scenario is it possible to get PSQL to return
a non 0 exit status?

Cheers,a
James



-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] pg_dump and schema names

2013-08-08 Thread James Sewell
I was under the impression that every command that references a relation
makes use of the search path, regardless of what it is *doing*. Maybe this
is different in older versions though?

I actually ran into this recently and had to remove all the xx. schema
components using vi before I could run the dump to move to a different
schema.

Just to be sure:

postgres=# create schema test;
CREATE SCHEMA
postgres=# set search_path to test;
SET
postgres=# create table test1(id serial);
NOTICE:  CREATE TABLE will create implicit sequence test1_id_seq for
serial column test1.id
CREATE TABLE
postgres=# alter table test1 owner to postgres;
ALTER TABLE


Cheers,

James Sewell
Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Fri, Aug 9, 2013 at 2:04 PM, Bruce Momjian br...@momjian.us wrote:

 pg_dump goes to great lengths not to hard-code the schema name into
 commands like CREATE TABLE, instead setting the search_path before
 creating the table;  these commands:

 CREATE SCHEMA xx;
 CREATE TABLE xx.test(x int);

 generates this output:

 SET search_path = xx, pg_catalog;
 CREATE TABLE test (
 x integer
 );

 If you dump a schema and want to reload it into another schema, you
 should only need to update that one search_path line.  However, later in
 the dump file, we hardcode the schema name for setting the object owner:

 ALTER TABLE xx.test OWNER TO postgres;

 Could we use search_path here to avoid the schema designation?  I am not
 sure this possible because while CREATE defaults to the first existing
 schema in the search_path, I am not sure ALTER has the same behavior
 because you are not _creating_ something with ALTER.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
image001.png

Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses

2013-07-14 Thread james

On 14/07/2013 20:13, Greg Smith wrote:
The most efficient way to write things out is to delay those writes as 
long as possible.


That doesn't smell right to me.  It might be that delaying allows more 
combining and allows the kernel to see more at once and optimise it, but 
I think the counter-argument is that it is an efficiency loss to have 
either CPU or disk idle waiting on the other.  It cannot make sense from 
a throughput point of view to have disks doing nothing and then become 
overloaded so they are a bottleneck (primarily seeking) and the CPU does 
nothing.


Now I have NOT measured behaviour but I'd observe that we see disks that 
can stream 100MB/s but do only 5% of that if they are doing random IO.  
Some random seeks during sync can't be helped, but if they are done when 
we aren't waiting for sync completion then they are in effect free.  The 
flip side is that we can't really know whether they will get merged with 
adjacent writes later so its hard to schedule them early.  But we can 
observe that if we have a bunch of writes to adjacent data then a seek 
to do the write is effectively amortised across them.


So it occurs to me that perhaps we can watch for patterns where we have 
groups of adjacent writes that might stream, and when they form we might 
schedule them to be pushed out early (if not immediately), ideally out 
as far as the drive (but not flushed from its cache) and without forcing 
all other data to be flushed too.  And perhaps we should always look to 
be getting drives dedicated to dbms to do something, even if it turns 
out to have been redundant in the end.


That's not necessarily easy on Linux without using a direct unbuffered 
IO but to me that is Linux' problem.  For a start its not the only 
target system, and having feedback 'we need' from db and mail system 
groups to the NT kernels devs hasn't hurt, and it never hurt Solaris to 
hear what Oracle and Sybase devs felt they needed either.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-08 Thread James Sewell
Hey,

New patch attached. I've moved from using a boolean to an enum trivalue.

Let me know what you think.

Cheers,
James

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Thu, Jul 4, 2013 at 8:23 PM, Magnus Hagander mag...@hagander.net wrote:


 On Thu, Jul 4, 2013 at 2:30 AM, James Sewell james.sew...@lisasoft.comwrote:

 Heya,

 I see what you are saying, the problem as I see it is that the action we
 are taking here is disable chasing ldap referrals. If the name is
 ldapreferrals and we use a boolean then setting it to 1  reads in a counter
 intuitive manner:


 That assumes that the default in the ldap library is always going to be to
 chase them. Does the standard somehow mandate that it should be?


   set ldapreferals=true to disable chasing LDAP referrals.


 You'd obviously reverse the meaning as well. set ldapreferals=false to
 disable chasing LDAP referrals.


 Perhaps you are fine with this though if it's documented? It does work in
 the inverse way to pam_ldap, where setting to true enables referral
 chasing. pam_ldap works like so:

   not set  : library default
   set to 0 : disable referral chasing
   set to 1 : enable referral chasing


 That is exactly what I'm suggesting it should do, and I'm pretty sure
 that's what Peter suggested as well.



 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
image001.png

pgsql_ldapnochaseref_v1.2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-03 Thread James Sewell
Heya,

I see what you are saying, the problem as I see it is that the action we
are taking here is disable chasing ldap referrals. If the name is
ldapreferrals and we use a boolean then setting it to 1  reads in a counter
intuitive manner:

  set ldapreferals=true to disable chasing LDAP referrals.

Perhaps you are fine with this though if it's documented? It does work in
the inverse way to pam_ldap, where setting to true enables referral
chasing. pam_ldap works like so:

  not set  : library default
  set to 0 : disable referral chasing
  set to 1 : enable referral chasing

The other option would be to have the default value (of the parameter) be
true and set the boolean to false to disable it. I can't find any other
examples of this though - I assume having a one off like this in the code
is a bad thing also?

I'm happy to let you guys decide.

Cheers,
James

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Wed, Jul 3, 2013 at 6:12 PM, Magnus Hagander mag...@hagander.net wrote:


 On Wed, Jul 3, 2013 at 3:04 AM, James Sewell james.sew...@lisasoft.comwrote:

 Hey Peter,

 You are correct, it is the same  as the referrals option in pam_ldap.
 It's also the -C (sometimes -R - it seems ldapsearch options are pretty
 non-standard) option in ldapsearch.

 As far as I'm aware you can't pass this in an LDAP URL, primarily because
 this never gets sent to the LDAP server. The server always returns an LDIF
 with inline references, this just determines if you chase them client side
 or just list them as is.

 I could be missing something here, but using:

  ldapreferrals={0|1}

 Would require a three state type, as we need a way of not interfering
 with the library defaults? To 'enable' the new behavior here using a
 boolean you would need to set ldapreferrals=false - which with the normal
 way of dealing with config booleans would alter the default behavior if the
 option was not specified.

 How do you feel about:

   ldapdisablereferrals=(0|1)


 I agree with Peter that the negative thing is bad. l don't see the
 problem, really. If you don't specify it, you rely on library defaults. If
 you do specify it, we lock it to that setting. I don't see the need to
 specifically have a setting to rely on library defaults - just remove it
 from the line and you get that.

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
image001.png

Re: [HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-02 Thread James Sewell
Hey Peter,

You are correct, it is the same  as the referrals option in pam_ldap. It's
also the -C (sometimes -R - it seems ldapsearch options are pretty
non-standard) option in ldapsearch.

As far as I'm aware you can't pass this in an LDAP URL, primarily because
this never gets sent to the LDAP server. The server always returns an LDIF
with inline references, this just determines if you chase them client side
or just list them as is.

I could be missing something here, but using:

 ldapreferrals={0|1}

Would require a three state type, as we need a way of not interfering with
the library defaults? To 'enable' the new behavior here using a boolean you
would need to set ldapreferrals=false - which with the normal way of
dealing with config booleans would alter the default behavior if the option
was not specified.

How do you feel about:

  ldapdisablereferrals=(0|1)

Cheers,
James Sewell


James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com



On Tue, Jul 2, 2013 at 10:46 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 7/2/13 12:20 AM, James Sewell wrote:
  Hey All,
 
  This patch request grew from this post (of mine) to pgsql-general:
 
 
 http://www.postgresql.org/message-id/cabuevezouae-g1_oejagujjmem675dnystwybp4d_wz6om+...@mail.gmail.com
 
  The patch adds another available LDAP option (ldapnochaseref) for
  search+bind mode in the pg_hba.conf fil. If set to 1 (0 is default) then
  it performs a ldap_set_option which disables chasing of any LDAP
  references which are returned as part of the search LDIF.

 This appears to be the same as the referrals option in pam_ldap
 (http://linux.die.net/man/5/pam_ldap).  So it seems legitimate.

 For consistency, I would name the option ldapreferrals={0|1}.  I prefer
 avoiding double negatives.

 Do you know of a standard way to represent this option in an LDAP URL,
 perhaps as an extension?



-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
image001.png

[HACKERS] [PATCH] Add an ldapoption to disable chasing LDAP referrals

2013-07-01 Thread James Sewell
Hey All,

This patch request grew from this post (of mine) to pgsql-general:

http://www.postgresql.org/message-id/cabuevezouae-g1_oejagujjmem675dnystwybp4d_wz6om+...@mail.gmail.com

The patch adds another available LDAP option (ldapnochaseref) for
search+bind mode in the pg_hba.conf fil. If set to 1 (0 is default) then it
performs a ldap_set_option which disables chasing of any LDAP references
which are returned as part of the search LDIF.

I can think of two use cases for this:

   1. (the case which spawned my email) A valid search is performed, but
   for some reason a ref: with a non responsive LDAP server is returned as
   well, which causes the authentication to time out (could be intermittent if
   DNS round robin or similar is used and some of the LDAP servers are not
   functioning / a packet dropping firewall is in the way).
   2. (a case I found when testing with AD) A valid search is performed and
   6 ref: entries are returned, which all must be chased before
   authentication can succeed. Setting ldapnochaseref speeds up authentication
   with no negative cost (assuming you understand your LDAP schema).

I think it's work noting that this setting seems to be the default for
ldapsearch on Linux these days.

Hopefully I found all the documentation that I was meant to update, let me
know if not though.

Cheers,

James Sewell
PostgreSQL Team Lead / Solutions Architect
_

[image:
http://www.lisasoft.com/sites/lisasoft/files/u1/2013hieghtslogan_0.png]

Level 2, 50 Queen St,
Melbourne, VIC, 3000

P: 03 8370 8000   F: 03 8370 8099  W: www.lisasoft.com

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
image001.png

pgsql_ldapnochaseref_v1.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython implementation

2013-06-30 Thread james

On 01/07/2013 02:43, Claudio Freire wrote:

In essence, you'd have to use another implementation. CPython guys
have left it very clear they don't intend to fix that, as they don't
consider it a bug. It's just how it is.

Given how useful it is to have a scripting language that can be used outside
of the database as well as inside it, would it be reasonable to consider
'promoting' pllua?

My understanding is that it (lua) is much cleaner under the hood (than 
CPython).

Although I do recognise that Python as a whole has always had more traction.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C++ compiler

2013-06-24 Thread james

On 25/06/2013 05:16, Tom Lane wrote:
It might be time to reconsider whether we should move the baseline 
portability requirement up to C99.


My understanding was that you picked up a lot of users when the Win32 
port became useful.  While you can build with msys, I would think that 
leaving Microsoft's tooling behind would be a mistake, and as far as I 
am aware they have said that they are supporting C++11 but not bothering 
with C99.


I'm really not in favor of moving to C++ though, as the 
portability-vs-usefulness tradeoffs seem pretty unattractive there.


As a long-time C++ programmer I don't see what the problem would be 
beyond (some) existing contributors being wary of the unknown.  Its not 
as if any platform developed enough to be a sane db server has not got a 
decent C++ compiler or two.  Portability is only really a problem with a 
subset of new C++11 features.




Re: [HACKERS] Parallel Sort

2013-05-24 Thread james
 Have you considered GPU-based sorting? I know there's been discussion 
in the past.


If you use OpenCL, then you can use a CPU driver if there is no GPU, and 
that can allow you to leverage all the CPU cores without having to do 
the multi-thread stuff in the backend.


While the compilation of a specific kernel can be quite expensive, it 
also has the effect of a JIT compiler in terms of system independence.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread James Cloos
 HL == Heikki Linnakangas hlinnakan...@vmware.com writes:

HL It would be nice to have a base-2 text format to represent floats.
HL It wouldn't be as human-friendly as base-10, but it could be used
HL when you don't want to lose precision. pg_dump in particular.

hexidecimal notation for floats exists.  The printf format flag is %a
for miniscule and %A for majuscule.

The result of 1./3. is 0xa.aabp-5.

This site has some info and a conversion demo:

http://gregstoll.dyndns.org/~gregstoll/floattohex/

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james

Anyway, this is getting way off track. The point is that the MS SDKs and
compilers are a bit of a mess and that MinGW support is useful because
we can't rely on them continuing to offer free SDKs and compilers in future.


Well, more compilers are always useful, but complaining that Microsoft 
might withdraw their working compilers smacks of 'what if?' paranoia. 
What if mingw support for Win64 was (sometimes/often/always/still) a bit 
rubbish?  Oh wait ...






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2013-01-27 Thread james

On the contrary, only a few months ago there was a far from groundless fear 
that Microsoft would do just that. Following considerable outcry they changed 
their mind. But this is definitely not just paranoia. As for w64 support, the 
mingw-64 project exists more or less explicitly to produce 64 bit compilers, 
including those hosted on mingw/msys.



Huh.  The only reason we have to use mingw64 or one of the assorted 
personal builds is because 'mingw support' doesn't deliver on its own, 
and last I looked there was a confusing variety of personal builds with 
various strengths and weaknesses.  I managed to make some progress but 
we seem to be a ways off having a reference download (and ideally one 
with clang too I guess).


I'd very much like there to be a good reference implementation, but the 
whole mingw/mingw64 thing is indicative of some problems, and reminds me 
of egcs.


You have references to back up your statements, and demonstrate that it 
wasn't primarily FUD?  FWIW I think the higher entry prices of pay-for 
VStudio almost guarantees continued availability of a free compiler, 
though it might end up slightly crippled, but I'm not a product planner 
for MS any more than you are.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-01-26 Thread james

So, while no native 64-bit compilers are available for free as part of
Visual Studio Express 2012 (11), it doesn't matter much. The issue is
more that it's very much Microsoft's whim whether they release compilers
at all and if so, which ones, when and how.


I think I have a pretty vanilla Visual Studio Express 2012 for Desktop and:

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64.\cl
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.51106.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64

Am I misunderstanding the discussion here?

Isn't that the 64-bit tool suite?

Does anyone care if the compiler is a 64 bit binary itself?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-08 Thread james


I had been wondering how to do such an insertion efficiently in the context of 
SPI, but it seems that there is no SPI_copy equiv that would allow a query 
parse and plan to be avoided.


Your query above would need to be planned too, although the plan will be 
trivial.


Ah yes, I meant that I had not found a way to avoid it (for multi-row 
inserts etc) from a stored proc context where I have SPI functions 
available.



You should not try to use it as a general bulk load facility. And it will not 
be as fast as COPY for several reasons, including that the Json parsing 
routines are necessarily much heavier than the COPY parse routines, which have 
in any case been optimized over quite a long period. Also, a single json datum 
is limited to no more than 1Gb. If you have such a datum, parsing it involves 
having it in memory and then taking a copy (I wonder if we could avoid that 
step - will take a look). Then each object is decomposed into a hash table of 
key value pairs, which it then used to construct the record datum. Each field 
name  in the result record is used to look up the value in the hash table - 
this happens once in the case of populate_record() and once per object in the 
array in the case of populate_recordset(). In the latter case the resulting 
records are put into a tuplestore structure (which spills to disk if necessary) 
which is then returned to the caller when all the objects in

 the js
on array are processed. COPY doesn't have these sorts of issues. It knows 
without having to look things up where each datum is in each record, and it 
stashes the result straight into the target table. It can read and insert huge 
numbers of rows without significant memory implications.

Yes - but I don't think I can use COPY from a stored proc context can I? 
 If I could use binary COPY from a stored proc that has received a 
binary param and unpacked to the data, it would be handy.


If SPI provided a way to perform a copy to a temp table and then some 
callback on an iterator that yields rows to it, that would do the trick 
I guess.



Perhaps if you give us a higher level view of what you're trying to achieve we 
can help you better.


I had been trying to identify a way to work with record sets where the 
records might be used for insert, or for updates or deletion statements, 
preferably without forming a large custom SQL statement that must then 
be parsed and planned (and which would be a PITA if I wanted to use the 
SQL-C preprocessor or some language bindings that like to prepare a 
statement and execute with params).


The data I work with has a master-detail structure and insertion 
performance matters, so I'm trying to limit manipulations to one 
statement per table per logical operation even where there are multiple 
detail rows.


Sometimes the network latency can be a pain too and that also suggests 
an RPC with unpack and insert locally.


Cheers
James



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-08 Thread james

You can use COPY from a stored procedure, but only to and from files.


I think that's in the chocolate fireguard realm though as far as 
efficiency for this sort of scenario goes, even if its handled by 
retaining an mmap'd file as workspace.





If SPI provided a way to perform a copy to a temp table and then some callback 
on an iterator that yields rows to it, that would do the trick I guess.


SPI is useful, but it's certainly possible to avoid its use. After all, that 
what almost the whole backend does, including the COPY code. Of course, it's a 
lot harder to write that way, which is part of why SPI exists. Efficiency has 
its price.


So it is possible to use a lower level interface from a C stored proc? 
SPI is the (only) documented direct function extension API isn't it?


Is the issue with using the JSON data-to-record set that the parsing can 
be costly?  Perhaps it can be achieved with B64 of compressed protobuf, 
or such.  I don't mind if it seems a bit messy - the code can be 
generated from the table easily enough, especially if I can use C++.  I 
guess an allocator that uses SPI_palloc would solve issues with memory 
management on error?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json api WIP patch

2013-01-07 Thread james

The processing functions have been extended to provide populate_record() and 
populate_recordset() functions.The latter in particular could be useful in 
decomposing a piece of json representing an array of flat objects (a fairly 
common pattern) into a set of Postgres records in a single pass.


So this would allow an 'insert into ... select ... from 
unpack-the-JSON(...)'?


I had been wondering how to do such an insertion efficiently in the 
context of SPI, but it seems that there is no SPI_copy equiv that would 
allow a query parse and plan to be avoided.


Is this mechanism likely to be as fast as we can get at the moment in 
contexts where copy is not feasible?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ILIKE vs indices

2012-12-29 Thread James Cloos
 TL == Tom Lane t...@sss.pgh.pa.us writes:

JC Is there any contraindication to recasting:
JC foo ILIKE 'bar'
JC into:
JC LOWER(foo) LIKE LOWER('bar')

TL In some locales those are not equivalent, I believe, or at least
TL shouldn't be.  (What the current code actually does is a separate
TL question.)

I see.  After determining indexing based on th existance of an initial
fixed string, exluding anything matching isalpha(), it uses tolower(3)
and friends to do the actual match.

So my proposal wouldn't change what matches, but might make fixing any
bugs in what *should* match more difficult?

TL In any case it's not obvious why LOWER rather than UPPER.

I suggested lower() because that matched all of the suggestions I found.
And as it turns out matches the current behaviour, too.

The footnote about adding explicit mention to the docs was expressly
because it is not otherwise obvious whether indices should use lower()
or upper().

I'll ask on one of the unicode lists whether there are any locales where
a case-insensitive match should be different than a case-preserving match
of tolower() vs tolower().

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ILIKE vs indices

2012-12-28 Thread James Cloos
While tuning an application, I found the posts from 2003 recomending the
use of LOWER() and LIKE in place of ILIKE to take advantage of indices.

For this app, given the limitations of the upper-layer protocol it must
support, that change replaced about 30 minutes of repeated seq scans with
about 1 minute of repeated index scans!  On a query-set often repeated
several times per day.  (Probably more times per day now.)

Is there any contraindication to recasting:

  foo ILIKE 'bar'

into:

  LOWER(foo) LIKE LOWER('bar')

and documenting that an index has to be on LOWER(column) to benefit ILIKE?

Perhaps the parser could read the former as the latter?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-20 Thread James Cloos
Updating pg_database to set datctype='C' did solve the speed issues with
the two largs dbs.

Presumably, since LC_CTYPE=en_US.UTF-8 was in the env when I ran pg_restore,
it overrode the ctype setting in the dump files.

Some of the slow selects do use ilike; even w/ datctype='C' the indices
are skipped for at least this query:

# explain analyze SELECT mb_id FROM mb WHERE name ILIKE 'foo@bar' AND ownerid=7;

 QUERY PLAN 
 
-
 Seq Scan on mb (cost=0.00..570.96 rows=3 width=4) (actual time=9.443..25.039 
rows=1 loops=1)
   Filter: ((name ~~* 'foo@bar'::text) AND (ownerid = 7))
   Rows Removed by Filter: 34827
 Total runtime: 25.071 ms
(4 rows)

The mb table has several indices, including separate ones on name and ownerid.

(not my design, btw.  And I really do need to re-write the middleware)

Whether it is strcoll(3) (even though LC_COLLATE is explicitly C) or
LIKE, it does make a significant difference for those two apps.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-20 Thread James Cloos
As a followup, I find that I can avoid the seq scan by adding an index
to that table as:

   create index mb_name_own_idx on mb ( lower(name), ownerid );

and changing the query from using the idiom:

  WHERE name ILIKE 'foo@bar' AND ownerid=7;

to using:

  WHERE lower(name) = lower('foo@bar') AND ownerid=7;

which saves 20+ ms on each of the 30+ k such selects in a full run.

I haven't tested how fast it would be with that change and a utf8 ctype.

Because of how the middleware achives its portability between pg, my et al,
changing it to use lower and = will require significant surgery.

Is there any way to specify the index such that the ILIKE query will use
said index?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-18 Thread James Cloos
 JB == Josh Berkus j...@agliodbs.com writes:

JB Can you check the collations of the two databases?  I'm wondering if 9.1
JB is in C collation and 9.2 is something else.

Thanks!

pg_dump -C tells me these two differences:

 -SET client_encoding = 'SQL_ASCII';
 +SET client_encoding = 'UTF8';

 -CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' 
LC_COLLATE = 'C' LC_CTYPE = 'C';
 +CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 
'C' LC_CTYPE = 'en_US.UTF-8';

for every db in the clusters.

I presume that lc_ctype is the significant difference?

LC_CTYPE *is* specified as 'C' in the dump from which I created the 9.2
cluster, so it must have been overridden by pg_restore.  I see that my
dist's /etc rc script now sets LC_CTYPE.  Would that explain why lc_ctype
changed between the two clusters?

Is there any way to alter a db's lc_ctype w/o dumping and restoring?  I
want to preserve some of the changes made since I copied the 9.1 cluster.
Alter database reports that lc_ctype cannot be changed.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos
I'm giving 9.2-beta2 a test simulating a production workflow.

Everything looks OK except the speed.  Most (all?) queries take about
five to six times as long as they do with 9.1.

The configurations are essentially the same, the query plans are the same.

A (hot) example, pulled semi-randomly from a run, with the names mangled
to protect the innocent:

= 9.1 =
 Nested Loop  (cost=0.00..26.92 rows=1 width=28) (actual time=0.114..0.514 
rows=19 loops=1)
   -  Index Scan using ms_pkey on ms msg  (cost=0.00..26.03 rows=1 width=20) 
(actual time=0.026..0.207 rows=19 loops=1)
 Index Cond: ((ms_id = 407) AND (ms_id = 435) AND (mb_id = 50222))
 Filter: (status = ANY ('{0,1,2}'::integer[]))
   -  Index Scan using ph_pkey on ph pm  (cost=0.00..0.87 rows=1 width=16) 
(actual time=0.010..0.010 rows=1 loops=19)
 Index Cond: (id = msg.ph_id)
 Total runtime: 0.605 ms

= 9.2 =
 Nested Loop  (cost=0.00..30.12 rows=1 width=28) (actual time=0.439..2.540 
rows=19 loops=1)
   -  Index Scan using ms_pkey on ms msg  (cost=0.00..29.18 rows=1 width=20) 
(actual time=0.155..1.157 rows=19 loops=1)
 Index Cond: ((ms_id = 407) AND (ms_id = 435) AND (mb_id = 50222))
 Filter: (status = ANY ('{0,1,2}'::integer[]))
   -  Index Scan using ph_pkey on ph pm  (cost=0.00..0.93 rows=1 width=16) 
(actual time=0.053..0.054 rows=1 loops=19)
 Index Cond: (id = msg.ph_id)
 Total runtime: 2.752 ms

All of the tables and indices for the run in question fit into ram.  The
effective cache, work mem, costs, etc were optimized in 9.0, and kept
for 9.1 and the beta.  That the plans are the same suggests that isn't
the problem, yes?

I think I recall mention from a previous beta (but goog isn't helping me
confirm) that there is some extra debugging or such enabled in the betas.

If so, and if turning that off would provide a better comparison, where
in the src should I look?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos
 AF == Andres Freund and...@2ndquadrant.com writes:

AF Is it possible that you compiled with assertions enabled? That would 
roughly 
AF fit that magnitude. SHOW debug_assertions; Should show you whether it was 
AF enabled.

Thanks, but SHOW debug_assertions reports off.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-17 Thread James Cloos
 PE == Peter Eisentraut pete...@gmx.net writes:

PE That depends on how you built it.  Just being a beta by itself doesn't
PE turn on any extra debugging.

OK.  So either I misremembered or it was something no longer done.

PE That depends on how you built it.

Its a Gentoo box; both were build from their ebuilds, with the same gcc,
flags, etc.

PE Compare the output of pg_config --configure from both installations.

The only differences are 9.1 vs 9.2 in the paths.

Thanks,

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fake async rep target

2012-05-30 Thread james
Well, I was assuming that there was some intelligence in the receiver 
that could effectively parse this for the application; are you 
suggesting that is effectively binary deltas to apply to raw pages?


Certainly, Sybase rep server works by creating function calls or SQL 
updates (depending on how you set it all up), and a fairly simple parser 
can be used to process the received stream.


I was hoping that the receiver code could be used to at least identify 
which tuples are affected and their before/after primary keys.


Ideally the result would be a packet by transaction, listing the 
impacted tables and for each table a list of affected primary keys and 
whether they were inserted, deleted or updated.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fake async rep target

2012-05-30 Thread james
Well, I was assuming that there was some intelligence in the receiver 
that could effectively parse this for the application; are you 
suggesting that is effectively binary deltas to apply to raw pages?


Certainly, Sybase rep server works by creating function calls or SQL 
updates (depending on how you set it all up), and a fairly simple parser 
can be used to process the received stream.


I was hoping that the receiver code could be used to at least identify 
which tuples are affected and their before/after primary keys.


Ideally the result would be a packet by transaction, listing the 
impacted tables and for each table a list of affected primary keys and 
whether they were inserted, deleted or updated.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fake async rep target

2012-05-29 Thread james

How easy would it be to implement a fake async rep target?

Perhaps even as something that a server could allow a connection to 
request?  (ie a suitably permissioned connection could convert itself to 
receive n async replication stream, rather than being statically 
configured?)


I know that it sounds a bit bonkers, but a while back I worked on a 
system where we configured a rep target (using OpenServer) we could 
observe changes to tables and enqueue secondary processing. Rather 
painful in that case because of the way that repserver is configured,
and I'm not sure it was worth the pain when configuring test and dev 
environments.


However, in principle, it seems that this is quite an elegant standing 
for a whole raft of trigger functions - and probably a lot cheaper to 
execute. The key, I think, is to be able to allow dynamic attachment of 
such a 'change feed' by an account that has god-like read access.


Is the existing async rep code amenable to this sort of abuse?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs in our Windows socket code

2012-05-13 Thread james

That is, if you request FD_WRITE events for a pre-existing socket with
WSAEventSelect, you will not get one until the outbound network buffer has
been filled and then has partially emptied.  (This is incredibly broken,
but Microsoft evidently has no intention of fixing it.)


I think you should distinguish between 'broken' and 'not what I am used 
to'.  Microsoft have always had an edge triggered delivery here, rather 
than the level triggered delivery on BSD.  But WinSock is not BSD 
sockets.  If you want to remember what the state is, associate a flag 
with the socket.  Microsoft are very good at retaining compatibility, 
'fixing it' in your terms would be crass breakage.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james
I haven't tried quex, but I have tried lemon (which can be broken out of 
SQLite) and re2c and ragel.


I like ragel and lemon, but the combination supports a push-parser style 
from memory, and many tools are inconvenient unless you are prepared to 
suck in a whole message before parsing, or let the parser drive a pull 
loop, or use a coroutine structure.


Could go all trendy and use a PEG tool like, er,, peg 
(http://piumarta.com/software/peg/).  (I haven't tried them tho')


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james

Doesn't that imply that a plan cache might be worthwhile?

But no matter: didn't the OP really have issue with packaging and 
Windows support - and there are a lot of Windows users, and in general 
there are many Windows devs: making it easier for them to contribute has 
to be good doesn't it?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Have we out-grown Flex?

2012-05-03 Thread james
I believe there are tools that are significantly faster than flex.  I 
believe re2c generates code that is faster.  But the key thing is to 
test, probably, or perhaps ask around.  I'm out of touch, but from 
memory flex wasn't the be-all and end-all.


Lemon is definitely easy to maintain/port and the result is pretty nice, 
too (I know Bison/Yacc wasn't the focus here).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] swapcache-style cache?

2012-02-22 Thread james
Has anyone considered managing a system like the DragonFLY swapcache for 
a DBMS like PostgreSQL?


ie where the admin can assign drives with good random read behaviour 
(but perhaps also-ran random write) such as SSDs to provide a cache for 
blocks that were dirtied, with async write that hopefully writes them 
out before they are forcibly discarded.


And where a cache fail (whether by timeout, hard fail, or CRC fail) just 
means having to go back to the real transactional storage.


I'd been thinking that swapcache would help where the working set won't 
fit in RAM, also L2ARC on Solaris - but it seems to me that there is no 
reason not to allow the DBMS to manage the set-aside area itself where 
it is given either access to the raw device or to a pre-sized file on 
the device it can map in segments.


While L2ARC is obviously very heavyweight and entwined in ZFS, 
Dragonfly's swapcache seems to me remarkably elegant and, it would seem, 
very effective.


James

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-04 Thread James Robinson


On Aug 4, 2011, at 4:55 AM, Florian Pflug wrote:





@OP: Here's my implementation of the feature you desire as a set of  
C-language
functions: https://github.com/fgp/pg_record_inspect. Other people  
did code up
similar things in the past, but I currently cannot find any links to  
their work.
But it little bit digging in the mailing list archives should turn  
them up.



Many thanks, Florian, we'll be checking that out.

James

James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres / plpgsql equivalent to python's getattr() ?

2011-08-03 Thread James Robinson

Hackers,

Python's getattr() allows for dynamic lookup of attributes on an  
object, as in:


inst = MyClass(x=12, y=24)
v = getattr(inst, 'x')
assert v == 12

Oftentimes in writing data validating trigger functions, it'd be real  
handy to be able to do a similar thing in plpgsql against column  
values in a row or record type, such as making use of a trigger  
argument for hint as what column to consider in this table's case. Oh,  
to be able to do something like (toy example known to be equivalent to  
a check):


CREATE OR REPLACE FUNCTION must_be_positive() RETURNS TRIGGER AS
$$
begin
if getattr(NEW, TG_ARGV[0]) = 0
then
raise exception(TG_ARGV[0] || ' must be positive');
end if;

-- after trigger
return null;
end;
$$ LANGUAGE PLPGSQL;


A function which takes a row + a text column name, and / or a peer  
function taking row + index within row would really open up plpgsql's  
expressivity in cases where you're writing mainly SQL stuff, not  
really wanting to go over to plpythonu or whatnot (whose description  
of rows are as dicts).


Is there something in the internals which inherently prevent this? Or  
am I fool and it already exists?


Not having to defer to EXECUTE would be attractive.



James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python tracebacks

2011-03-01 Thread James William Pye
On Mar 1, 2011, at 12:10 PM, Jan Urbański wrote:
 So you end up with a context message saying PL/Python function %s and
 a detail message with the saved detail (if it's present) *and* the
 traceback. The problem is that the name of the function is already in
 the traceback, so there's no need for the context *if* there's a
 traceback present.
 
 The problem I'm having is technical: since the callback is already set
 when the code reaches the traceback-printing stage, you can't really
 unset it. AFAICS the elog code calls *all* callbacks from
 error_context_stack. So I can't prevent the context message from
 appearing. If I make the traceback part of the context as well, it's
 just going to appear together with the message from the callback.

I remember going through a lot of pain getting this done right
in pg-python[pl/py].

SELECT it_blows_up();
ERROR:  function's main raised a Python exception
CONTEXT:  [exception from Python]
Traceback (most recent call last):
   File public.it_blows_up(), line 13, in main
three()
   File public.it_blows_up(), line 10, in three
return two()
   File public.it_blows_up(), line 7, in two
return one()
   File public.it_blows_up(), line 4, in one
raise OverflowError(there's water everywhere)
 OverflowError: there's water everywhere

[public.it_blows_up()]


IIRC, I unconditionally print the [public.it_blows_up()] part iff it's
not an ERROR. If it is an ERROR, I let the traceback rendering part of the code
handle it on the PL's entry point exit. It was really tricky to do this because 
I
was rendering the traceback *after* the error_context_stack had been called.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python improvements

2010-12-24 Thread James William Pye
On Dec 23, 2010, at 3:38 AM, Jan Urbański wrote:
 Oh, didn't know that. I see that it does some more fancy things, like
 defining a inheritance hierarchy for these exceptions and adding some
 more into the mix.

Right, there were some cases that appeared to benefit from larger
buckets than what the existing code classes provided. Also, some of the
exceptions in there are strictly for py-postgresql/client-side things.

 The names I used are not really invented, they're just plpgsql condition
 names from
 http://www.postgresql.org/docs/current/static/errcodes-appendix.html
 with underscores changed to camel case. Also, since they're
 autogenerated from utils/errcodes.h they don't have any hierarchy, they
 just all inherit from SPIError.

For the backend setting, I think this is quite appropriate.

However, for pg-python, I had mixed feelings about this as I wanted to
be able to leverage py-postgresql's hierarchy, but still have the projects
independent. I ended up punting on this one by using a single error class,
and forcing the user to compare the codes. =(

 Sticking Error to every one of them will result in things like
 SubstringErrorError, so I'm not really sold on that.

There was some creativity applied to the names in postgresql.exceptions
to accommodate for things like that. (Like no redundant Error)

 Basically I think
 more PL/Python users will be familiar with condition names as you use
 them in pl/pgsql than with the names from py-postgresql.

I think that's fair assumption. In fact, I think that might make a good
TODO for py-postgresql/pg-python. Provide a plpgsql-code-name to
exception class mapping.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hstores in pl/python

2010-12-14 Thread James William Pye
On Dec 13, 2010, at 6:16 PM, Tom Lane wrote:
 how do you identify which type OID is really hstore?


How about an identification field on pg_type?

CREATE TYPE hstore ..., IDENTIFIER 'org.postgresql.hstore';
-- Where the identifier is an arbitrary string.

Type information can be looked up by the PL, and the I/O functions can
be dynamically resolved using the identifier.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: XLog vs SSD [Was: Re: [HACKERS] random write in xlog?]

2010-12-09 Thread James Cloos
 JJ == Jeff Janes jeff.ja...@gmail.com writes:

JJ So PG always writing 8K at a time is unlikely to make a difference
JJ than if it wrote a smaller amount.

Ah.

Somehow I was thinking of the xlog files' 16M filesize rather than the
internal 8k block size

If it is only writing 8k blocks then there is probably little hope of
making efficient use of flash.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


XLog vs SSD [Was: Re: [HACKERS] random write in xlog?]

2010-12-08 Thread James Cloos
 JJ == Jeff Janes jeff.ja...@gmail.com writes:

JJ Anyway, the writes are logically sequentially, but not physically.
JJ If I remember correctly, it always writes out full blocks, even if
JJ the last part of the block has not yet been filled with new data.
JJ When the remainder gets filled, it then writes out the full block
JJ again, both the already written and the new part.

What does that mean for use of a flash SSD for the xlog dir?

Does the block writing mesh up well with the usage pattern a flash
SSD needs to maximize lifespan?

I'd love a dram ssd for pg_xlog and the journals for the other
filesystems, but they cost too much.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos
 JD == Jeff Davis pg...@j-davis.com writes:

JD 2. Fix the input/output functions in a special mode for dump/reload,
JDto make them true inverses.

That can be done by supporting the %A printf(3)/scanf(3) format.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos
 JD == Jeff Davis pg...@j-davis.com writes:

JD 2. Fix the input/output functions in a special mode for dump/reload,
JD to make them true inverses.

JC That can be done by supporting the %A printf(3)/scanf(3) format.

JD I don't happen to see a %A format in the man page, but I doubt the
JD output would look like a timestamp.

Modern printf(1) also supports it, so an easy example:

:; printf '%A\n' 3.1415926535897932384626433832795029
0XC.90FDAA22168C235P-2

%a is the same, but with miniscule letters.

It is the hex format of the float types, and defaults to exactly enough
precision.  The length modifier L makes %a expect a long double.

JD And if it doesn't look like a timestamp, it violates the spirit of a
JD logical dump of the data.

Point taken.  Had I read the whole thread before replying I would have
been reminded that the float timestamps were archaic; that avoids any
need of %A for timestamps.

That said, the possiblity of hex i/o format for the float datatypes
would be welcome.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating-point timestamps versus Range Types

2010-10-25 Thread James Cloos
 TL == Tom Lane t...@sss.pgh.pa.us writes:

JC That said, the possiblity of hex i/o format for the float datatypes
JC would be welcome.

TL It's unportable, for two different reasons:

TL 2. The printf specifiers you want us to rely on are not standard.

They are in C99.

TL 1. pg_dump output would become platform-specific.  This is highly
TL undesirable.

It is true that pg would have to test for them in configure and supply
alternative code wherever libc fails to support them.

I can readily accept that there are many more pressing needs.

But would such a patch for master be rejected?

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] .gitignore files, take two

2010-10-09 Thread James Cloos
I'm reading this a bit late, but...

We (Xorg) found that ignoring:

   *~
   *.bak
   *.patch

in addition to the files generated by building is very helpful.

We do use git tag and git describe in the make dist process, as
well as git log ChangeLog.  That may be relevant; avoiding git
describe's dirty designation is important when using it in that
fashion.  But it helps to be able to run git add --all cleanly.

I understand that other git users have similar experience.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres vs. intel ccNUMA on Linux

2010-09-29 Thread James Robinson

Hackers,

	Any tips / conventional wisdom regarding running postgres on large- 
ish memory ccNUMA intel machines, such as a 32G dual-quad-core,  
showing two NUMA nodes of 16G each? I expect each postgres backend's  
non-shared memory usage to remain nice and reasonably sized, hopefully  
staying within the confines of its processor's local memory region,  
but how will accesses to shared memory and / or buffer cache play out?  
Do people tune their backends via 'numactl' ?


	Furthermore, if one had more than one database being served by the  
machine, would it be advisable to do this via multiple clusters  
instead of a single cluster, tweaking the processor affinity of each  
postmaster accordingly, trying to ensure each cluster's shared memory  
segments and buffer cache pools remain local for the resulting backends?


Thanks!

James Robinson
Socialserve.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-28 Thread James William Pye
On Aug 9, 2010, at 11:49 AM, Kris Jurka wrote:
 Oh, duh.  It's a server side copy not going through the client at all. Here's 
 a hopefully final patch.

Trying it out... Works for me.

I understand the resistance to the patch, but it would be
quite nice to see this wart in the rear view. =\
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Python 2.7 deprecated the PyCObject API?

2010-08-14 Thread James William Pye
On Aug 14, 2010, at 9:08 AM, Tom Lane wrote:
 Just to clarify, you're recommending something like
 
   proc-me = PyCObject_FromVoidPtr(proc, NULL);
 + if (proc-me == NULL)
 + elog(ERROR, could not create PyCObject for function);
   PyDict_SetItemString(PLy_procedure_cache, key, proc-me);
 
 correct?  (Hm, and it looks like we'd better move the pfree just above 
 that...)

Almost, there's still a Python exception to report and/or clear.
I only glanced at this and didn't recall what the plpython mechanisms were for 
that, thus the ambiguous complain().

 Yeah, and since we'll have to back-patch it, a fairly noninvasive patch
 would be nice.  Will you work on that?

I was hoping that Peter would pop in with a patch, but I think a few lines of 
CPP may suffice..
(warning: untested =)

#ifdef Py_CAPSULE_H
/*
 * Python.h (2.7 and up) includes pycapsule.h, so rely on the header
 * define to detect the API's existence.
 */
#define PyCObject_FromVoidPtr(POINTER, IGNORED) PyCapsule_New(POINTER, NULL, 
NULL)
#undef PyCObject_Check
#define PyCObject_Check(OBJ) PyCapsule_CheckExact(OBJ)
#define PyCObject_AsVoidPtr(OBJ) PyCapsule_GetPointer(OBJ, NULL)
#endif /* Py_CAPSULE_H */

http://svn.python.org/view/python/branches/release27-maint/Include/pycapsule.h?view=markup

yay? nay?
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Python 2.7 deprecated the PyCObject API?

2010-08-13 Thread James William Pye
On Aug 13, 2010, at 5:20 PM, Tom Lane wrote:
 According to a discussion over in Fedora-land, $subject is true:
 http://lists.fedoraproject.org/pipermail/devel/2010-August/140995.html
 
 I see several calls in plpython.c that seem to refer to PyCObject stuff.
 Anybody have any idea if we need to do something about this?


Well, we should at least be checking for an exception here anyways:

proc-me = PyCObject_FromVoidPtr(proc, NULL);
PyDict_SetItemString(PLy_procedure_cache, key, proc-me);

if (proc-me == NULL) complain();

That is, with those warnings adjustments, proc-me will be NULL and then 
explode in PyDict_SetItemString:

[David Malcolm]
However, if someone overrides the process-wide warnings settings, then
the API can fail altogether, raising a PendingDeprecationWarning
exception (which in CPython terms means setting a thread-specific error
state and returning NULL).
./


AFA a better fix is concerned, the shortest route would seem to be to use the 
new capsule stuff iff Python = 2.7.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-06 Thread James William Pye
On Aug 6, 2010, at 4:31 PM, Kris Jurka wrote:
 binary-copy-end-v2.patch

I think there's a snag in the patch:

postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY;
ERROR:  row field count is -1, expected 1
CONTEXT:  COPY data, line 4

Probably a quick/small fix away, I imagine.


But, I was able to trigger the new ERROR with py-postgresql:

 import postgresql as pg
 db=pg.open('localhost/postgres')
 q=db.prepare('copy data FROM STDIN WITH BINARY')
 from itertools import chain
 import sys
 db.pq.tracer = sys.stderr.write
 q.load_rows(chain(open('/Users/jwp/DATA.bcopy', 'rb'), (b'EXTRA',)))
↑ B(25): b'B\x00\x00\x00\x18\x00py:0x1268b30\x00\x00\x00\x00\x00\x00\x00'
↑ E(10): b'E\x00\x00\x00\t\x00\x00\x00\x00\x01'
↑ S(5): b'S\x00\x00\x00\x04'
↓ b'2'(0): b''
↓ b'G'(5): b'\x01\x00\x01\x00\x01'
↑__(7): b'PGCOPY\n'
↑__(3): b'\xff\r\n'
↑__(41): 
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\x01\x00\x01\x00\x00\x00\x04\x00\x00\x00\x02\x00\x01\x00\x00\x00\x04\x00\x00\x00\x03\xff\xff'
↑__(5): b'EXTRA'
↑ c(5): b'c\x00\x00\x00\x04'
↑ S(5): b'S\x00\x00\x00\x04'
↓ b'E'(95): b'SERROR\x00C22P04\x00Mreceived copy data after EOF marker\x00WCOPY 
data, line 4\x00Fcopy.c\x00L2081\x00RCopyFrom\x00\x00'
↓ b'Z'(1): b'I'
Traceback (most recent call last):
  File stdin, line 1, in module
  snip
  File 
/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/postgresql/driver/pq3.py,
 line 462, in raise_server_error
raise server_error
postgresql.exceptions.BadCopyError: received copy data after EOF marker
  CODE: 22P04
  LOCATION: File 'copy.c', line 2081, in CopyFrom from SERVER
  CONTEXT: COPY data, line 4
STATEMENT: [prepared]
  sql_parameter_types: []
  statement_id: py:0x1268b30
  string: copy data FROM STDIN WITH BINARY
CONNECTION: [idle]
  client_address: ::1/128
  client_port: 63922
  version:
PostgreSQL 9.1devel on x86_64-apple-darwin10.4.0, compiled by GCC 
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
CONNECTOR: [Host] pq://jwp:*...@localhost:5432/postgres
  category: None
DRIVER: postgresql.driver.pq3.Driver
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye
On Jul 25, 2010, at 8:01 AM, Kris Jurka wrote:
 The JDBC driver reads server messages for multiple reasons.

 One of them is indeed to do early failure detection.

That's high quality. =)

 Another is to pickup NoticeResponse messages to avoid a network buffer 
 deadlock.

That's a good catch. I don't think psql/restore would often run into this as 
when COPY IN is in play, it's normally restoring a database. However, with 
JDBC, I imagine COPY would more often be used to do bulk loading into live 
tables that may very well cause a NOTICE. [Well, I reference psql/libpq because 
I don't recall it recognizing failure during COPY IN in the past, so I assume 
it's not receiving any data in that state.]

hrm, I suppose a lazy way around that problem would be to suspend all client 
messages(client_min_messages) during COPY IN. Tho, I guess one would still have 
to contend with NotificationResponse, and ParameterStatus..

 So this is possible to work around driver side by peeking into the network 
 stream and delaying processing of the end of copy until the driver agrees 
 that the copy is done, but

I don't think you would have to peek in. If the interface were to always hold 
onto the last message or last n-bytes submitted to be sent, it would be able to 
send the possible CopyData(EOF) and CopyDone once the COPY operation (at the 
interface level) is closed/shutdown/terminated. Granted, this is dependent on 
CopyData(EOF) not being in the middle of regular CopyData, but I gather that 
that would end in an ErrorResponse anyways.

 I still maintain that this is a server bug. It is not OK for the server to 
 assume that the client is done and move on, the client must tell the server 
 what it wants done.


I'm a bit torn here. While it would seem to be either a bug in the spec or a 
bug in the server, I'm inclined to call it a wart in the server's 
implementation of the spec.

I don't see the fix as being dangerous, but I imagine an implementor would want 
to have the workaround in place regardless. I certainly would.

I'd be in favor of seeing this fixed in 9.x, and the documentation updated to 
warn implementors about the wart in the older versions.. That is, I don't see 
any reason why we can't get rid of this unsightly thing considering the 
workarounds would still work with a wart-free server.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye
On Jul 28, 2010, at 9:53 AM, Kris Jurka wrote:
 Technically you won't get NotificationResponse until transaction end, so you 
 don't need to worry about that mid copy.

Ah, thanks for noting that. It would appear my original reading of the async 
section didn't get far enough beyond Frontends must be prepared to deal with 
these messages at any time, even when not engaged in a query.. I see the note 
below clarifying NotificationResponse.

 One of the key points of confusion is that CopyData(EOF) does not result in 
 an error.
 It results in ignoring any futher data.
 The problem I have is that for text mode it waits for CopyDone, but in binary 
 mode it ends the copy sequence immediately.

That is bothersome. :\

 Additionally the interface exposed by the JDBC driver lets the user write 
 arbitrary CopyData bytes to the server, so without parsing all of that we 
 don't know whether they've issued CopyData(EOF) or not.

Okay, so you can't know with absolute certainty without parsing the data, but 
the usual case would be handled by holding onto the last-N bytes or so. Enough 
to fit the EOF and perhaps a little more for paranoia's sake.

That's not to say that I'm missing the problem. When (not if, when) the 
user feeds data past a CopyData(EOF), it's going to get interesting.

[Thinking about the logic necessary to handle such a case and avoid network 
buffer deadlock...]
I would think the least invasive way to handle it would be to set the 
CommandComplete and ReadyForQuery messages aside when they are received if 
CopyDone hasn't been sent, continue the COPY operation as usual until it is 
shutdown, send CopyDone and, finally, reinstate CommandComplete and RFQ as if 
they were just received.. I don't think that really accommodates for CopyFail 
as the status in RFQ will need to be adjusted to match what was actually 
done..? Well, I'm not sure you would need to worry about NoticeResponse after a 
premature CommandComplete as INSERTs are no longer happening. ugh.


+1 for a fix.


Not directly regarding your patch, but while the discussion is in the general 
area.
I think it would be wise to throw an error when non-empty CopyData messages are 
received after CopyData(EOF). Chances are that the user is making a mistake and 
should be notified of it.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   >