date:20140710

My table currently uses up 62 GB of storage, and it has 450 M rows. This
narrow table has a PK on (ParentID, ChildNumber), and it has between 20K and
50K of child rows per parent.

The data is inserted daily, rarely modified, never deleted. The performance
of modifications is not an issue. The only select from it is as follows:

SELECT column_lis FROM MyChildTable WHERE ParentID=?
ORDER BY ChildNumber;

The selects are frequent, and their performance is essential.

Would you advice me to partition this table?

TIA



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Should-I-partition-this-table-tp5811130.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

On Thu, 10 Jul 2014 07:59:20 -0700 (PDT) AlexK alk...@gmail.com wrote:

 My table currently uses up 62 GB of storage, and it has 450 M rows. This
 narrow table has a PK on (ParentID, ChildNumber), and it has between 20K and
 50K of child rows per parent.
 
 The data is inserted daily, rarely modified, never deleted. The performance
 of modifications is not an issue. The only select from it is as follows:
 
 SELECT column_lis FROM MyChildTable WHERE ParentID=?
 ORDER BY ChildNumber;
 
 The selects are frequent, and their performance is essential.
 
 Would you advise me to partition this table?

In general, yes, given the information you provided.  A parition on
ParentID % $something should improve performance.  Exactly what
$something is will take some experimenting on your part to determine.
In my experience, the scenario you describe is likely to see SELECT
performance improve nearly linerally to the number of partitions, up
to some point that will take exerpimenting to determine.

However, there may other methods of partition and/or rearranging the
data that would be even better, depending on a lot of information you
did not provide.

As an example, how are your selects distributed?  Are they fairly even
across the entire data set?  Or do ParentID become less accessed the
older they get?  If the latter, you'll probably be better served by
lazily archiving infrequently accessed ParentID rows to an archive
table and adjusting the application to search that table only if the
rows weren't found in the primary table.  This is an improvement over
modulous partitioning becuase it's more likely that the frequently
accessed data will be in memory and stay there.

Going even further, since the data is infrequently modified, you might be
better served by putting some sort of cache (memcache, or a custom in-
app cache) in front of the DB and checking it first.  The in-app cache
is always the best because it incurs no network traffic to access, but
the feasibility of doing that depends on the exact nature of the
application.  Say it with me: An RDBMS is not RAM, and trying to use
it like RAM will probably lead to disappointing performance.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

2014-07-10 Thread Kevin Grittner

AlexK alk...@gmail.com wrote:

 My table currently uses up 62 GB of storage, and it has 450 M
 rows. This narrow table has a PK on (ParentID, ChildNumber), and
 it has between 20K and 50K of child rows per parent.

 The data is inserted daily, rarely modified, never deleted. The
 performance of modifications is not an issue. The only select
 from it is as follows:

 SELECT column_lis FROM MyChildTable WHERE ParentID=?
 ORDER BY ChildNumber;

 The selects are frequent, and their performance is essential.

 Would you advice me to partition this table?

You didn't actually tell us about the most salient facts for
whether partitioning will improve or degrade performance.

If data does not all fit in cache and parents are added over time
with increasing ID values and the vast majority of queries only
reference recent parents, then partitioning by ranges of parentID
will improve your cache hit ratio and thereby improve performance.

Even if all data fits in cache, if children are only added to
recently added parents you could partition by parentID and CLUSTER
partitions when they reach the point where there are few if any new
children or updates.  This will reduce the number of pages
referenced per scan, and may allow partitioning to be a win.

Otherwise I would expect partitioning to hurt performance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

2014-07-10 Thread Igor Neyman

 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Chris Hanks
 Sent: Thursday, July 10, 2014 5:02 AM
 To: PostgreSQL General
 Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
 suboptimal plan on 9.3.4

 Hi everyone -

 I have a slow query issue in an app I'm working on. I'm unfortunately not at
 liberty to share the query/schema details, but I've put together a very 
 similar
 reproduction of the issue:

 -

 CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
 integer primary key, col integer); CREATE TABLE c (id integer primary key, col
 integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
 e (id integer primary key, col integer);

 INSERT INTO a (id, col) SELECT i, floor(random() * 10) FROM
 generate_series(1, 10, 2) i; INSERT INTO b (id, col) SELECT i,
 floor(random() * 10) FROM generate_series(1, 10, 2) i; INSERT INTO
 c (id, col) SELECT i, floor(random() * 10) FROM generate_series(2,
 10, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 10) FROM
 generate_series(2, 10, 2) i; INSERT INTO e (id, col) SELECT i,
 floor(random() * 10) FROM generate_series(1, 10, 1) i;

 ANALYZE;

 CREATE VIEW tables AS
   SELECT a.*, b.col AS other_col
   FROM a
   LEFT JOIN b ON a.id = b.id
   UNION ALL
   SELECT c.*, d.col AS other_col
   FROM c
   LEFT JOIN d ON c.id = d.id;

 EXPLAIN ANALYZE
 SELECT *
 FROM tables
 WHERE id = 89; -- Index scans, as expected.

 EXPLAIN ANALYZE
 SELECT *
 FROM e
 JOIN tables ON e.col = tables.id
 WHERE e.id = 568; -- Big merge joins, when simple index scans should be
 possible?

 -

 Would this be considered a deficiency in the optimizer? Is there a simple fix?

 Thanks!
 Chris

Chris,

 JOIN tables ON e.col = tables.id  - is this a typo?

Shouldn't it be  JOIN tables ON e.id = tables.id ?

Or, you need it the way it is?

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

2014-07-10 Thread Chris Hanks

I need it the way it is. It's a foreign key in the actual query.

Thanks!

On Thu, Jul 10, 2014 at 8:31 AM, Igor Neyman iney...@perceptron.com wrote:
 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Chris Hanks
 Sent: Thursday, July 10, 2014 5:02 AM
 To: PostgreSQL General
 Subject: [GENERAL] Joining on a view containing a UNION ALL produces a
 suboptimal plan on 9.3.4

 Hi everyone -

 I have a slow query issue in an app I'm working on. I'm unfortunately not at
 liberty to share the query/schema details, but I've put together a very 
 similar
 reproduction of the issue:

 -

 CREATE TABLE a (id integer primary key, col integer); CREATE TABLE b (id
 integer primary key, col integer); CREATE TABLE c (id integer primary key, 
 col
 integer); CREATE TABLE d (id integer primary key, col integer); CREATE TABLE
 e (id integer primary key, col integer);

 INSERT INTO a (id, col) SELECT i, floor(random() * 10) FROM
 generate_series(1, 10, 2) i; INSERT INTO b (id, col) SELECT i,
 floor(random() * 10) FROM generate_series(1, 10, 2) i; INSERT INTO
 c (id, col) SELECT i, floor(random() * 10) FROM generate_series(2,
 10, 2) i; INSERT INTO d (id, col) SELECT i, floor(random() * 10) FROM
 generate_series(2, 10, 2) i; INSERT INTO e (id, col) SELECT i,
 floor(random() * 10) FROM generate_series(1, 10, 1) i;

 ANALYZE;

 CREATE VIEW tables AS
   SELECT a.*, b.col AS other_col
   FROM a
   LEFT JOIN b ON a.id = b.id
   UNION ALL
   SELECT c.*, d.col AS other_col
   FROM c
   LEFT JOIN d ON c.id = d.id;

 EXPLAIN ANALYZE
 SELECT *
 FROM tables
 WHERE id = 89; -- Index scans, as expected.

 EXPLAIN ANALYZE
 SELECT *
 FROM e
 JOIN tables ON e.col = tables.id
 WHERE e.id = 568; -- Big merge joins, when simple index scans should be
 possible?

 -

 Would this be considered a deficiency in the optimizer? Is there a simple 
 fix?

 Thanks!
 Chris

 Chris,

  JOIN tables ON e.col = tables.id  - is this a typo?

 Shouldn't it be  JOIN tables ON e.id = tables.id ?

 Or, you need it the way it is?

 Regards,
 Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

Bill,

Regarding SELECT performance improve nearly linerally to the number of
partitions, - can you elaborate why? If I split my table into several
partitions, even the index depth may stay the same, because the PK is
narrow, it only consists of 2 4-byte integers.

My selects are distributed more or less evenly, so we really like your
suggestion to split on ParentID%SomeNumber, thank you!

At this time we would rather not introduce bugs related to cache
invalidation. Although we do need to read data fast, we do not want stale
data, and we cannot afford any bugs in this application.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Should-I-partition-this-table-tp5811130p5811141.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

2014-07-10 Thread Tom Lane

Chris Hanks christopher.m.ha...@gmail.com writes:
 CREATE VIEW tables AS
   SELECT a.*, b.col AS other_col
   FROM a
   LEFT JOIN b ON a.id = b.id
   UNION ALL
   SELECT c.*, d.col AS other_col
   FROM c
   LEFT JOIN d ON c.id = d.id;

 EXPLAIN ANALYZE
 SELECT *
 FROM tables
 WHERE id = 89; -- Index scans, as expected.

 EXPLAIN ANALYZE
 SELECT *
 FROM e
 JOIN tables ON e.col = tables.id
 WHERE e.id = 568; -- Big merge joins, when simple index scans should
 be possible?

 Would this be considered a deficiency in the optimizer? Is there a simple fix?

Don't hold your breath.  To arrive at the
union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
would have to create a parameterized path for the UNION ALL structure.
But when you have joins in the arms of the UNION ALL, they are considered
to be independent subqueries, and we currently have a policy decision not
to try to generate parameterized paths for subqueries.  It'd be quite
expensive and I think the planner is probably lacking some necessary
mechanisms anyway.

Given that e.id is unique, you could possibly fake it with something like

select * from tables where id = (select e.col from e where e.id = 568);

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

Kevin,

For now, all the data fits in the cache: the box has 384GB of RAM. But I
want to be ready for later, when we have more data. It is easier to refactor
my table now, when it is still smallish.

Children are only added to recently added parents, and they are all
added/updated/deleted at once. These child rows represent an object which
changes as a whole.

Parents are added over time at a steady pace, with increasing ID values. But
we frequently read history as well as recent rows. Also we sometimes remove,
always the parent and all its child rows.

I hope this is enough information. Thank you!



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Should-I-partition-this-table-tp5811130p5811142.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Standby Server and Barman Backup on production system

2014-07-10 Thread Quinlan Pfiffer

On Thu, Jul 10, 2014 at 2:24 AM, basti mailingl...@unix-solution.de wrote:

Hello,
I had followed this discuss
(http://www.postgresql.org/message-id/CABRT9RAXzUa=_zT_M4Z1vyDuFkpgNCZLUnRTUO5gvK2kKkNu=a...@mail.gmail.com).

I have a similar problem now:

I use one Postgres Server as Master an an other one as Standby (WAL
archives).
I do also a daily backup of the Master Server using pg_dump.
Now there is a situation where a possible restore via cat dumpfile |
psql takes to long and the server load is too high.

So my idea is to use barman for backup.
Is it possible to use wal replication and barman backupin one config file?
Is there someone how has experience with this?

The relevant barman (test)config looks like:

wal_level = archive
archive_mode = on
archive_command = 'rsync -a %p */var/lib/barman/btest/incoming/*%f'

The relevant wal replication config on production system (master) looks
like:
wal_level = hot_standby
archive_mode = on
archive_command = 'rsync -a %p -e ssh
-i*/var/lib/postgresql/*.ssh/id_rsapostg...@standby.srv:/var/lib/postgresql/9.1/wals/master_main/%f
/dev/null'

I believe that since this is just a bash command you could feasibly either:
* Chain together two rsync commands with
* Write a script that takes the from (%p) and to (%f) locations as
arguments to rsync them to the appropriate places.

Can I use a 2'nd rsync command here? How should I do?
What are differences between wal_level = archive and wal_level =
archive or doesn't matter here?

hot_standby has more information. From the documentation for 9.1:

In hot_standby level, the same information is logged as with archive, plus
information needed to reconstruct the status of running transactions from
the WAL. To enable read-only queries on a standby server, wal_level must be
set to hot_standby on the primary, and hot_standby must be enabled in the
standby.

So you'll get more information and bigger files being transferred if you
set your wal_level to hot_standby.

I've personally never tried rsyncing to two locations at once, so I'm not
going to comment on that. Good luck.

Re: [GENERAL] Should I partition this table?

On Thu, 10 Jul 2014 08:40:59 -0700 (PDT) AlexK alk...@gmail.com wrote:

 Bill,
 
 Regarding SELECT performance improve nearly linerally to the number of
 partitions, - can you elaborate why? If I split my table into several
 partitions, even the index depth may stay the same, because the PK is
 narrow, it only consists of 2 4-byte integers.

That statement is based on experimentation.  About a year ago, I did a
research project for a former employer to determine the best way to
store a large amount of data.  Like you, we were talking about narrow
rows, but we were looking at the total # of rows exceeding 10 billion.
Also different, we didn't expect the number of unique ParentIDs to
ever exceed 100,000.

I managed to borrow some beefy hardware from another project that
wouldn't need it for a few weeks and do some experimenting with
different partition configurations, all compared to a baseline of
an unpartitioned table.  I don't remember exactly, but I believe I
was populating the test databases with 1 billion rows.

That's where I came up with the linear determination.  A query that
averaged 100ms on a single table averaged 10ms on 10 partitions and
1ms on 100 partitions (etc).  Every test I concocted seemed to support
that the improvement was linear.

Another point that I expermimented with, and is worth noting: not all
queries are able to benefit from the partitioning, but the test queries
that I ran that could not, saw only a few percent of performance hit.
Since our app design would take advantage of the partitioning for 99%
of its queries, it was a no-brainer.

As a result, I can only theorize, but my theory is that since each
partition is actually a table, and has indexes independently of the
other table/partitions, that both indexes searches and table scans
required less work with smaller tables.  That theory doesn't 100%
make sense, since index performance should not degrade linerally with
index size, but that was the behavior I observed.  It's likely that
because of the significant size of the hardware, that the performance
curve was simply so close to flat that it looked linear at the sizes
I was working with.

 At this time we would rather not introduce bugs related to cache
 invalidation. Although we do need to read data fast, we do not want stale
 data, and we cannot afford any bugs in this application.

I can't make your decisions for you.  But I'll make an attempt here
to try to help you avoid the same mistake I made.

On that same project, we made a similar decision: caches are sources
of bugs so we'll just make sure the DB is tuned well enough that we
don't need a cache.

That decision led to a lot of late night work under duress to get
caching reliably implemented at the last minute.  There are a lot of
things that partitioning doesn't speed up: parsing and planning the
queries, the time it takes the network to move data back and forth,
and the time it takes the application to demarshall the results
provided by the SQL server.  Caching data in RAM completely removes
all of that overhead.

Of course, only you and your developers can make the determination as
to whether you need it.  I'm just recommending that you don't fail
to consider it simply because it's difficult to do reliably.  If
you're use the can't risk bugs argument, you might benefit more
by improving your QA process than anything else.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Joining on a view containing a UNION ALL produces a suboptimal plan on 9.3.4

2014-07-10 Thread Chris Hanks

That did the trick! Thanks, Tom!

On Thu, Jul 10, 2014 at 8:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Chris Hanks christopher.m.ha...@gmail.com writes:
 CREATE VIEW tables AS
   SELECT a.*, b.col AS other_col
   FROM a
   LEFT JOIN b ON a.id = b.id
   UNION ALL
   SELECT c.*, d.col AS other_col
   FROM c
   LEFT JOIN d ON c.id = d.id;

 EXPLAIN ANALYZE
 SELECT *
 FROM tables
 WHERE id = 89; -- Index scans, as expected.

 EXPLAIN ANALYZE
 SELECT *
 FROM e
 JOIN tables ON e.col = tables.id
 WHERE e.id = 568; -- Big merge joins, when simple index scans should
 be possible?

 Would this be considered a deficiency in the optimizer? Is there a simple 
 fix?

 Don't hold your breath.  To arrive at the
 union-on-the-inside-of-a-nestloop plan you're hoping for, the planner
 would have to create a parameterized path for the UNION ALL structure.
 But when you have joins in the arms of the UNION ALL, they are considered
 to be independent subqueries, and we currently have a policy decision not
 to try to generate parameterized paths for subqueries.  It'd be quite
 expensive and I think the planner is probably lacking some necessary
 mechanisms anyway.

 Given that e.id is unique, you could possibly fake it with something like

 select * from tables where id = (select e.col from e where e.id = 568);

 regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

2014-07-10 Thread Kevin Grittner

AlexK alk...@gmail.com wrote:

 For now, all the data fits in the cache: the box has 384GB of
 RAM. But I want to be ready for later, when we have more data. It
 is easier to refactor my table now, when it is still smallish.

Makes sense.

 Children are only added to recently added parents, and they are
 all added/updated/deleted at once. These child rows represent an
 object which changes as a whole.

 Parents are added over time at a steady pace, with increasing ID
 values. But we frequently read history as well as recent rows.
 Also we sometimes remove, always the parent and all its child
 rows.

That suggests to me that a partition based on ranges of parent IDs
would be optimal, with a CLUSTER of each partition as it reaches a
fairly stable state.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

Kevin,

What would be the advantages of partitioning on ranges of ParentID? Each
query will touch at most one partition. I might or might not get PK indexes
one level of depth less. 

I understand that I will CLUSTER these smaller tables and benefit from that.
Other than clustering, what are other advantages?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Should-I-partition-this-table-tp5811130p5811157.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] checkpoint

2014-07-10 Thread Yves Dorfsman


Hi,

If I run checkpoint from psql, is it applied to all the databases?

What if I do it though an API? When connecting with psycopg2, I'm forced to
specify a database name, if I use dbname=postgres, and execute
checkpoint;, is it applied to all the databases?

Thanks.

-- 
Yves.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

On Thu, 10 Jul 2014 11:36:27 -0700 (PDT) AlexK alk...@gmail.com wrote:

 What would be the advantages of partitioning on ranges of ParentID? Each
 query will touch at most one partition. I might or might not get PK indexes
 one level of depth less. 

You need to partition by ParentID in order for the example queries you
provided to benefit from the partitioning.  You should abstain from removing
the previous comments from each email reply ... I'm not going to dig back
through this thread to find specific examples.

 I understand that I will CLUSTER these smaller tables and benefit from that.
 Other than clustering, what are other advantages?

Personally, I find the benefits of CLUSTER to be debatable.  I haven't done
a lot of investigation, but my experience has been that keeping things
CLUSTERed is more work than it's worth.  Certainly, if you have a table
that _never_ changes, and he access patterns dictate it, there's no reason
not to CLUSTER, but I'm not convinced that you'll benefit (again, there's
a lot of information about your application use that hasn't been provided
that's necessary to make such a determination)

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] checkpoint

2014-07-10 Thread Guillaume Lelarge

2014-07-10 20:56 GMT+02:00 Yves Dorfsman y...@zioup.com:


 Hi,

 If I run checkpoint from psql, is it applied to all the databases?

 What if I do it though an API? When connecting with psycopg2, I'm forced to
 specify a database name, if I use dbname=postgres, and execute
 checkpoint;, is it applied to all the databases?


Yes. CHECKPOINT is more a cluster-wide operation, rather than a database
one.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com

Re: [GENERAL] checkpoint

2014-07-10 Thread Yves Dorfsman

On 2014-07-10 13:02, Guillaume Lelarge wrote:
 2014-07-10 20:56 GMT+02:00 Yves Dorfsman y...@zioup.com 
 mailto:y...@zioup.com:
 
 
 Hi,
 
 If I run checkpoint from psql, is it applied to all the databases?
 
 What if I do it though an API? When connecting with psycopg2, I'm forced 
 to
 specify a database name, if I use dbname=postgres, and execute
 checkpoint;, is it applied to all the databases?
 
 
 Yes. CHECKPOINT is more a cluster-wide operation, rather than a database one.
 

This is what I had observed, but wanted to be 100% sure. Thanks.

-- 
Yves.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Should I partition this table?

2014-07-10 Thread Jeff Janes

On Thu, Jul 10, 2014 at 8:20 AM, Bill Moran wmo...@potentialtech.com
wrote:

 On Thu, 10 Jul 2014 07:59:20 -0700 (PDT) AlexK alk...@gmail.com wrote:

  My table currently uses up 62 GB of storage, and it has 450 M rows. This
  narrow table has a PK on (ParentID, ChildNumber), and it has between 20K
 and
  50K of child rows per parent.
 
  The data is inserted daily, rarely modified, never deleted. The
 performance
  of modifications is not an issue. The only select from it is as follows:
 
  SELECT column_lis FROM MyChildTable WHERE ParentID=?
  ORDER BY ChildNumber;
 
  The selects are frequent, and their performance is essential.


How is their performance currently?


 
  Would you advise me to partition this table?



No, not based on the current info.  There is no reason to think
partitioning would improve the performance that matters to you.  You said
that child rows for the same parent are all inserted at the same time, so
they should naturally be well-clustered.  That will be important for
performance once the data exceeds what can be cached.  If that clustering
did not occur naturally then you might benefit from imposing it, and
partitioning might be an import part of doing that. But it sounds like you
will not need to worry about that.



 In general, yes, given the information you provided.  A parition on
 ParentID % $something should improve performance.


PostgresSQL's constraint exclusion logic is not smart enough to turn a
simple equality into a mod equality.  Which means every select query would
have to include AND ParentID % something = (:1 % somthing) in addition to
the primary clause ParentID=:1, in order to benefit from constraint
exclusion.  That would be very unnatural, annoying, and error prone.  Range
partitioning would be better, if any partitioning is needed at all.

Cheers,

Jeff

Re: [GENERAL] Should I partition this table?

2014-07-10 Thread Jeff Janes

On Thu, Jul 10, 2014 at 11:36 AM, AlexK alk...@gmail.com wrote:

 Kevin,

 What would be the advantages of partitioning on ranges of ParentID? Each
 query will touch at most one partition. I might or might not get PK indexes
 one level of depth less.

 I understand that I will CLUSTER these smaller tables and benefit from
 that.
 Other than clustering, what are other advantages?


If you don't partition, it will take an unacceptably long time to run
CLUSTER on the entire table.  If you do partition, you can CLUSTER one
partition at a time, and only need to CLUSTER the fast-changing partitions
more than once.  But based on your description, you probably don't need to
run explicit CLUSTERs anyway as your data would end up naturally clustered.

Cheers,

Jeff

[GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Aram Fingal

I just tried to set up a PostgreSQL server on an existing instillation of 
Ubuntu 13.10 server but I am getting an error trying to start the server and I 
am not finding anything relevant to the error searching the web.

Here’s what I did to install:

$ sudo apt-get install postgresql
$ sudo apt-get install postgresql-contrib

I set a password for the postgres user and edited the pg_hba.conf file as 
follows:

skipping a bunch of comments

# Put your actual configuration here
# --
#
# If you want to allow non-local connections, you need to add more
# host records.  In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.

listen_addresses='*'

# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
# Noninteractive access to all databases is required during automatic
# maintenance (custom daily cronjobs, replication, and similar tasks).
#
# Database administrative login by Unix domain socket
local   all postgrespeer

# TYPE  DATABASEUSERADDRESS METHOD

# local is for Unix domain socket connections only
local   all all peer
# IPv4 local connections:
hostall all 127.0.0.1/32md5
hostall all all md5

# IPv6 local connections:
hostall all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication postgrespeer
#hostreplication postgres127.0.0.1/32md5
#hostreplication postgres::1/128 md5


Then I try to restart the server:

$ sudo /etc/init.d/postgresql restart
 * Restarting PostgreSQL 9.1 database server

 * The PostgreSQL server failed to start. Please 
check the log output:
2014-07-10 16:34:39 EDT LOG:  invalid connection type listen_addresses='*'
2014-07-10 16:34:39 EDT CONTEXT:  line 75 of configuration file 
/etc/postgresql/9.1/main/pg_hba.conf
2014-07-10 16:34:39 EDT FATAL:  could not load pg_hba.conf

Re: [GENERAL] Should I partition this table?

On Thu, 10 Jul 2014 13:16:05 -0700 Jeff Janes jeff.ja...@gmail.com wrote:
 
  In general, yes, given the information you provided.  A parition on
  ParentID % $something should improve performance.
 
 PostgresSQL's constraint exclusion logic is not smart enough to turn a
 simple equality into a mod equality.  Which means every select query would
 have to include AND ParentID % something = (:1 % somthing) in addition to
 the primary clause ParentID=:1, in order to benefit from constraint
 exclusion.  That would be very unnatural, annoying, and error prone.  Range
 partitioning would be better, if any partitioning is needed at all.

I was remiss in pointing out the additional WHERE requirement -- I tend to
assume that people are already aware of that, but it's likely that not
everyone is.  Thank you for pointing it out.

As far as your comments against adding them: I'm not going to speculate as
to what kind of queries people do or do not find annoying.  The term
unnatural is an odd choice of words, and the only thing I can think to
respond with is platypus.  As far as error-prone is concerned, it's going
to have to be the OPs decision on whether the additional work is worth the
improvement.  In the end, software isn't error-prone, programmers are error-
prone.  If you have a good QA process in place, then you don't worry about
error-prone programmers, as the QA process catches their mistakes.  If you
don't have such a process in place, or you don't trust it; then things get
harder and you make tradeoff decisions like, I don't want to write complex
code, even if it's better, because we don't have the ability to ensure it's
error free.  And I can't make those kinds of judgments because I don't know
what your environment is like.

There's no silver bullet.  The OP doesn't seem to have any information about
what he's planning for: How big is the data predicted to get?  What is
an acceptable level of performance?  Has he even tested to see if the existing
layout will scale acceptably to the expected data volume?  (perhaps nothing
needs to be changed at all)  We don't know, so we can only speculate.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Paul Jungwirth

 listen_addresses='*'

I'm pretty sure that listen_addresses belongs in postgresql.conf, not
pg_hba.conf.

Paul




On Thu, Jul 10, 2014 at 1:40 PM, Aram Fingal fin...@multifactorial.com wrote:

 I just tried to set up a PostgreSQL server on an existing instillation of 
 Ubuntu 13.10 server but I am getting an error trying to start the server and 
 I am not finding anything relevant to the error searching the web.

 Here’s what I did to install:

 $ sudo apt-get install postgresql
 $ sudo apt-get install postgresql-contrib

 I set a password for the postgres user and edited the pg_hba.conf file as 
 follows:

 skipping a bunch of comments

 # Put your actual configuration here
 # --
 #
 # If you want to allow non-local connections, you need to add more
 # host records.  In that case you will also need to make PostgreSQL
 # listen on a non-local interface via the listen_addresses
 # configuration parameter, or via the -i or -h command line switches.

 listen_addresses='*'

 # DO NOT DISABLE!
 # If you change this first entry you will need to make sure that the
 # database superuser can access the database using some other method.
 # Noninteractive access to all databases is required during automatic
 # maintenance (custom daily cronjobs, replication, and similar tasks).
 #
 # Database administrative login by Unix domain socket
 local   all postgrespeer

 # TYPE  DATABASEUSERADDRESS METHOD

 # local is for Unix domain socket connections only
 local   all all peer
 # IPv4 local connections:
 hostall all 127.0.0.1/32md5
 hostall all all md5

 # IPv6 local connections:
 hostall all ::1/128 md5
 # Allow replication connections from localhost, by a user with the
 # replication privilege.
 #local   replication postgrespeer
 #hostreplication postgres127.0.0.1/32md5
 #hostreplication postgres::1/128 md5


 Then I try to restart the server:

 $ sudo /etc/init.d/postgresql restart
  * Restarting PostgreSQL 9.1 database server  
   
  * The PostgreSQL server failed to start. 
 Please check the log output:
 2014-07-10 16:34:39 EDT LOG:  invalid connection type listen_addresses='*'
 2014-07-10 16:34:39 EDT CONTEXT:  line 75 of configuration file 
 /etc/postgresql/9.1/main/pg_hba.conf
 2014-07-10 16:34:39 EDT FATAL:  could not load pg_hba.conf




-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Igor Neyman

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Aram Fingal
Sent: Thursday, July 10, 2014 4:40 PM
To: Postgres-General General
Subject: [GENERAL] invalid connection type listen_addresses='*'

I just tried to set up a PostgreSQL server on an existing instillation of 
Ubuntu 13.10 server but I am getting an error trying to start the server and I 
am not finding anything relevant to the error searching the web.

Here's what I did to install:

$ sudo apt-get install postgresql
$ sudo apt-get install postgresql-contrib

I set a password for the postgres user and edited the pg_hba.conf file as 
follows:

skipping a bunch of comments

# Put your actual configuration here
# --
#
# If you want to allow non-local connections, you need to add more
# host records.  In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.

listen_addresses='*'

# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
# Noninteractive access to all databases is required during automatic
# maintenance (custom daily cronjobs, replication, and similar tasks).
#
# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# local is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
host    all             all             all             md5

# IPv6 local connections:
host    all             all             ::1/128                 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     postgres                                peer
#host    replication     postgres        127.0.0.1/32            md5
#host    replication     postgres        ::1/128                 md5

Then I try to restart the server:

$ sudo /etc/init.d/postgresql restart
 * Restarting PostgreSQL 9.1 database server                                    

                             * The PostgreSQL server failed to start. Please 
check the log output:
2014-07-10 16:34:39 EDT LOG:  invalid connection type listen_addresses='*'
2014-07-10 16:34:39 EDT CONTEXT:  line 75 of configuration file 
/etc/postgresql/9.1/main/pg_hba.conf
2014-07-10 16:34:39 EDT FATAL:  could not load pg_hba.conf

Adam,

listen_addresses='*'  parameter doesn't belong in pg_hba.conf

This parameter should be in postgresql.conf

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Aram Fingal

 
 listen_addresses='*'  parameter doesn't belong in pg_hba.conf
 
 This parameter should be in postgresql.conf


Thanks.  That was really unclear, at least the way I followed the online 
documentation:

http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html

…even after following the link to the listen_addresses parameter.


-Aram

Re: [GENERAL] invalid connection type listen_addresses='*'

Aram Fingal wrote

listen_addresses='*' parameter doesn't belong in pg_hba.conf

This parameter should be in postgresql.conf

Thanks. That was really unclear, at least the way I followed the online
documentation:

http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html

…even after following the link to the listen_addresses parameter.

It is non-specific since it is assumed at this point in the documentation
that you realize ALL configuration parameters are defined in postgres.conf
or its includes. The fact the comment is in a note, and links elsewhere,
implicitly reinforces that fact - if it belonged in the pg_hba.conf file its
description would be part of the normal document and not a sidebar.

Admittedly this is all perfectly clear when you actually understand
everything already. While the documentation may be unclear to you
unfortunately you are the only person in a long while to actually post the
complaint to the lists and so its hard to justify figuring out how to make
the documentation clearer. It is equally important not to be redundant,
verbose and/or repetitive. The occasional confusion making its way to the
list it preferred.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/invalid-connection-type-listen-addresses-tp5811177p5811192.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid connection type listen_addresses='*'

David G Johnston wrote

Aram Fingal wrote

listen_addresses='*' parameter doesn't belong in pg_hba.conf

This parameter should be in postgresql.conf

Thanks. That was really unclear, at least the way I followed the online
documentation:

http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html

…even after following the link to the listen_addresses parameter.
It is non-specific since it is assumed at this point in the documentation
that you realize ALL configuration parameters are defined in
postgres.conf or its includes. The fact the comment is in a note, and
links elsewhere, implicitly reinforces that fact - if it belonged in the
pg_hba.conf file its description would be part of the normal document and
not a sidebar.

All that said I would not be opposed to adding a parethetical to the note:

[...] value for the listen_addresses configuration parameter (in
postgres.conf), since the default [...]

since indeed the reader's mind is on the pg_hba.conf file and so easing the
context switch is an easy improvement.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/invalid-connection-type-listen-addresses-tp5811177p5811194.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Paul Jungwirth

It is non-specific since it is assumed at this point in the documentation
that you realize ALL configuration parameters are defined in
postgres.conf or its includes.

I think the comments in pg_hba.conf are a lot more misleading than the
online documentation, and are more likely to be read. They say:

# Put your actual configuration here
# --
#
# If you want to allow non-local connections, you need to add more
# host records. In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.

Followed by four blank lines, then some more comments and the default
settings. It really invites the user to fill in that blank space with
a listen_addresses line.

Paul

On Thu, Jul 10, 2014 at 3:25 PM, David G Johnston
david.g.johns...@gmail.com wrote:
David G Johnston wrote

Aram Fingal wrote

listen_addresses='*' parameter doesn't belong in pg_hba.conf

This parameter should be in postgresql.conf

Thanks. That was really unclear, at least the way I followed the online
documentation:

http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html

All that said I would not be opposed to adding a parethetical to the note:

[...] value for the listen_addresses configuration parameter (in
postgres.conf), since the default [...]

since indeed the reader's mind is on the pg_hba.conf file and so easing the
context switch is an easy improvement.

David J.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
_
Pulchritudo splendor veritatis.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Standby Server and Barman Backup on production system

2014-07-10 Thread Sebastian Fiedler - Nexst4 GmbH

Hello,
I had followed this discuss
(http://www.postgresql.org/message-id/CABRT9RAXzUa=_zT_M4Z1vyDuFkpgNCZLUnRTUO5gvK2kKkNu=a...@mail.gmail.com).

I have a similar problem now:

I use one Postgres Server as Master an an other one as Standby (WAL
archives).
I do also a daily backup of the Master Server using pg_dump.
Now there is a situation where a possible restore via cat dumpfile |
psql  takes to long and the server load is too high.

So my idea is to use barman for backup.
Is it possible to use wal replication and barman backupin one config file?
Is there someone how has experience with this?

The relevant barman (test)config looks like:

wal_level = archive
archive_mode = on
archive_command = 'rsync -a %p /var/lib/barman/btest/incoming/%f'

The relevant wal replication config on production system (master) looks
like:
wal_level = hot_standby
archive_mode = on   
archive_command = 'rsync -a %p -e ssh -i
/var/lib/postgresql/.ssh/id_rsa
postg...@standby.srv:/var/lib/postgresql/9.1/wals/master_main/%f /dev/null'

Can I use a 2'nd rsync command here? How should I do?
What are differences between wal_level = archive and wal_level =
archive or doesn't matter here?

-- 
Nexst4 GmbH
Riesaer Straße 7
01129 Dresden

Tel.: +49 (351) 655 76 64
Fax: +49 (351) 655 76 66
Mail: sebastian.fied...@nexst4.de 

 

Geschäftsführer: Matthias Schmidt, Alf Thiele
Sitz der Gesellschaft: Dresden
HRB 27274



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Windows Installation User account - Correct database for us

2014-07-10 Thread Don Brown

Hello

We are writing a small application and we are trying to determine if 
PostgreSQL is the right database for us. 

The application at this stage is only for a single user and commonly for 
persons with little computer expertise. 

When the database is installed a postgreSQL user account is created which 
in most cases will be the second user account on the PC.  The result of 
this is the user now has to select the user account when ever the computer 
is restarted. 

The programmer working on this application has suggested looking at an 
imbedded database, something like H2 as an alternative. 

I was hoping the members of this group may have some comments or 
suggestions as to the direction we should look at. 

Thank you and appreciate any comments/suggestions 

Don

[GENERAL] how does full text searching tokenize words ? can it be altered?

2014-07-10 Thread Jonathan Vanasco


I'm getting a handful of 'can not index words longer than 2047 characters' on 
my `gin` indexes.

1. does this 2047 character count correspond to tokens / indexed words?  
2. if so, is there a way to lower this number ?
3. is there a way to profile the index for the frequency of tokens ?


( apologies in advance if this looks familiar, i posted this as part of a 
larger question last month; everything but this was answered by the list and I 
can't find answers to this online )




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Windows Installation User account - Correct database for us

2014-07-10 Thread Steve Atkins


On Jul 10, 2014, at 5:01 PM, Don Brown dbr...@msd.net.au wrote:

 Hello
 
 We are writing a small application and we are trying to determine if 
 PostgreSQL is the right database for us. 
 
 The application at this stage is only for a single user and commonly for 
 persons with little computer expertise. 
 
 When the database is installed a postgreSQL user account is created which in 
 most cases will be the second user account on the PC.  The result of this is 
 the user now has to select the user account when ever the computer is 
 restarted.

I'd be surprised if that was required behaviour, but I'm not really familiar 
with current PostgreSQL packaging for Windows.

  The programmer working on this application has suggested looking at an 
 imbedded database, something like H2 as an alternative. 

Installation and management of PostgreSQL on Windows hits occasional minor 
roadbumps - not a problem for someone deploying and using PostgreSQL, but 
potentially a cause of support overhead if you're invisibly installing the 
database along with your app and not expecting your user to be aware of it.

If you need the power and flexibility of PostgreSQL, or want to allow your 
users direct database access and want to give them a good experience there, 
then the advantages probably outweigh the possible issues. If you don't need 
that then an embedded database might be a better match.

I'd look at SQLite as an embedded alternative, myself. It's a solid embedded 
SQL database. If you're entirely a Java shop then H2 might well be a good 
choice too.

 
 I was hoping the members of this group may have some comments or suggestions 
 as to the direction we should look at. 
 
 Thank you and appreciate any comments/suggestions 

Cheers,
  Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Windows Installation User account - Correct database for us

2014-07-10 Thread John R Pierce


On 7/10/2014 5:01 PM, Don Brown wrote:
When the database is installed a postgreSQL user account is created 
which in most cases will be the second user account on the PC.  The 
result of this is the user now has to select the user account when 
ever the computer is restarted.


I thought I saw that the latest versions of the eDB PostgreSQL installer 
for Windows are using the Network Service built-in account rather than 
creating a special user ?


anyways even if they are creating a service account, it should be 
flagged such that its not valid for regular login, hence doesn't show up 
on the desktop login, much the same as quite a few built in accounts, 
like the IUSR_machinename account that Microsoft's IIS webserver uses by 
default.


--
john r pierce  37N 122W
somewhere on the middle of the left coast

[GENERAL] Use of tsvector in array

2014-07-10 Thread Huang, Suya

Hi,

We have the requirement of using the data type tsvector [], however, I didn't 
find out how to:

* Use array operator together with tsquery operator

o   I have to unnest the array and then do query like ts@@ to_tsquery('ipod')

* Create GIN index on tsvector[]

o   ERROR:  data type tsvector[] has no default operator class for access 
method gin

Not sure if anyone ever has this experience could share?

Thanks,
Suya

Re: [GENERAL] Windows Installation User account - Correct database for us

Don Brown wrote
 Thank you and appreciate any comments/suggestions 

Host the database in a shared-tenent arrangement and have your application
remotely connect to it or to an intermediary application that will then
perform the work and simply deal with input/output with the client.

Dave




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Windows-Installation-User-account-Correct-database-for-us-tp5811204p5811211.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Use of tsvector in array