Re: [GENERAL] unexpected data beyond EOF in block 260 of relation pg_tblspc

2015-07-14 Thread Mitu Verma
Hi team,

kindly help me with the below query-

Below alarm was raised at the system where postgres database was used..

---
BGWPOL22-00:/var/opt/mediation/MMStorage6/Server8/CXC1734739_R8J/storage/logging/ACTIVE#
 cat AA::80165730a!
1434566795-Server8eBGWPOL22-00!! iThere is a problem with the system 
environment (memory, disk, etc).

Failed to store log information in the database in AUDIT
ERROR:  unexpected data beyond EOF in block 260 of relation 
pg_tblspc/16388/PG_9.1_201105231/16384/16418

HINT:  This has been seen to occur with buggy kernels; consider updating your 
system.
Failed to transfer a row to the database table auditTrailLogEntry.
---

Entry into the table Is also getting failed due to this.

My question is
1.how alarm 801 is generated and how come the description/information is 
fulfilled in alarm on which it is talking about BUGGY kernel.


Regards
Mitu



Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Julien Rouhaud
On 14/07/2015 18:50, Igor Stassiy wrote:
> Julien, I would gladly provide more information, I am just not sure what
> to add.
> 

Well, was your concern about why option #2 is the quickest, or is this
runtime with option #2 still too slow for you ?

> I would be willing to leave the server compromised for things like
> corrupts or data losses during the time of this import, but the server
> has to be up and running before and after the import, if it is
> successful (so I can't take it down then change some parameters and
> start it up with again).

Check http://www.postgresql.org/docs/current/static/sql-createtable.html
and the "UNLOGGED" part to check if an unlogged table is suitable for you.

> 
> On Tue, Jul 14, 2015 at 6:37 PM Julien Rouhaud
> mailto:julien.rouh...@dalibo.com>> wrote:
> 
> On 14/07/2015 18:21, Igor Stassiy wrote:
> > Julien, I have the following setting for WAL level: #wal_level =
> minimal
> > (which defaults to minimal anyway)
> >
> 
> Sorry, I sent my mail too early :/
> 
> So, option #2 is winner by design. You didn't say anything about your
> needs, so it's hard to help you much more.
> 
> If you don't care about losing data on this table if your server
> crashes, you can try option #3 with an unlogged table.
> 
> 
> > On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud
> > mailto:julien.rouh...@dalibo.com>
>  >> wrote:
> >
> > On 14/07/2015 11:12, Igor Stassiy wrote:
> > > Hello,
> > >
> > > I am benchmarking different ways of putting data into table
> on table
> > > creation:
> > >
> > > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id
>  
> >  = b.id  
> > > ;
> > > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id
>  
> >  = b.id  
> > > ;
> > > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id
>  
> >  = b.id  
> > > ) TO STDOUT" |
> > > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM
> STDIN";
> > >
> > > (the parallel command is available as part of parallel deb
> package in
> > > Ubuntu for example, it splits the stdin by newline character
> and feeds
> > > it to the corresponding command)
> > >
> > > Both tables a and b have ~16M records and one of the columns
> in a is
> > > geometry (ranging from several KB in size to several MB).
> Columns in b
> > > are mostly integers.
> > >
> > > The machine that I am running these commands on has the
> following
> > > parameters:
> > >
> > > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > > constraint_exclusion = on # pgtune wizard 2012-06-06
> > > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > > work_mem = 80MB # pgtune wizard 2012-06-06
> > > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > > max_connections = 400 # pgtune wizard 2012-06-06
> > >
> > > One would expect the 3rd option to be faster than 1 and 2,
> however 2
> > > outperforms both by a large margin (sometimes x2). This is
> especially
> > > surprising taking into account that COPY doesn't acquire a
> global lock
> > > on the table, only a RowExclusiveLock
> > > (according
> > > to
> >   
>  http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
> > >
> >
> > What is wal_level value? I think this is because of an
> optimisation
> > happening with wal_level = minimal:
> >
> > "In minimal level, WAL-logging of some bulk operations can be
> safely
> > skipped, which can make those operations much faster"
> >
> > see
> >   
>  http://www.postgresql.org/docs/current/static/runtime-config-wal.html
> >
> > > So is option 2 a winner by design? Could you please suggest
> other
> > > alternatives to try (if there are any)? And what might be
> the reason
> > > that 3 is not outperforming the other 2?
> > >
> > > Thank you,
> > > Igor
> > >
> > >
> >
> >
> > --
> > Julien Rouhaud
> > http://dalibo.com - http://dalibo.org
> >
> 
> 
> --
>

Re: [GENERAL] Disconnected but query still running

2015-07-14 Thread Kevin Grittner
Eduardo Piombino  wrote:

> (the timer had stopped almost exactly at 2 hours, 7.210.123 ms to
> be exact, which makes me think of a 2 hours timeout somewhere).

The most likely cause of this would be that you were accessing the
server through a firewall with neither the client nor the server
configured for TCP keepalive messages, or configured to an initial
delay longer than the firewall is willing to tolerate.  The default
for initial delay in most environments is two hours, with a few
retries at much shorter intervals after that.  Or it could be that
the client was using TCP keepalive, and the network was down when it
hit the two hour mark and tried to confirm that the connection was
still good.

http://www.postgresql.org/docs/current/static/runtime-config-connection.html

http://www.postgresql.org/docs/current/static/libpq-connect.html

http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

http://superuser.com/questions/729034/any-way-to-keep-connection-alive-in-pgadmin-without-setting-it-on-the-server

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread William Dunn
On Tue, Jul 14, 2015 at 2:19 PM, Marc Mamin  wrote:

>
> Any rule of the thumb with which number of pages per relation it is worth
> to start indexing ?


The code for the monitoring tool check_postgres uses table size larger than
5.12kb as a rule of thumb, expecting that for tables smaller than 5.12kb
the planner may choose a sequential scan instead because the table is so
small. check_postgres was written a long time ago though, so someone who is
more familiar with the optimizer may be able to provide a better estimate.

*Will J. Dunn*
*willjdunn.com *

On Tue, Jul 14, 2015 at 2:19 PM, Marc Mamin  wrote:

>
> >On 7/9/2015 12:41 PM, Tom Lane wrote:
> >> Andy Colson  writes:
> >>> My question is:  Will PG cache only the index (assuming it can always
> do
> >>> an Index Only Scan), or will it cache the table as well?
>
> I'm not sure that indexes on tiny tables are useful.
> They raise the options to consider by the query planner, which has its
> small cost too.
> I'd be interested on other opinions on this.
> Any rule of the thumb with which number of pages per relation it is worth
> to start indexing ?
>
> And still another question: I've have tiny static tables too, that never
> got analyzed.
> Can this fool the query planner in a negative way ?
>
> regards,
>
> Marc Mamin
>
> >> The table blocks would fall out of cache if they're never touched.
> >>
> >>regards, tom lane
> >>
> >>
> >
> >Sweet!  Thanks Tom.
> >
> >
> >--
> >Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> >To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
> >
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread Andy Colson

On 7/14/2015 1:19 PM, Marc Mamin wrote:



On 7/9/2015 12:41 PM, Tom Lane wrote:

Andy Colson  writes:

My question is:  Will PG cache only the index (assuming it can
always do an Index Only Scan), or will it cache the table as
well?


I'm not sure that indexes on tiny tables are useful. They raise the
options to consider by the query planner, which has its small cost
too. I'd be interested on other opinions on this. Any rule of the
thumb with which number of pages per relation it is worth to start
indexing ?

And still another question: I've have tiny static tables too, that
never got analyzed. Can this fool the query planner in a negative way
?

regards,

Marc Mamin



They can be.  A unique constraint to ensue correctness for example.  In
my case some of my "tiny" tables can be different sizes from 100 to 500
rows.  I want indexes on those in case they get bigger and start to get
slow.  PG can figure out when to use and not to use the index.  I'd 
rather have the safety net.




And still another question: I've have tiny static tables too, that never got 
analyzed.
Can this fool the query planner in a negative way ?


I would say yes.  A tiny table is quickest when it is table scanned, but 
its only going to be a few milliseconds more if it uses the index (also 
depending on how much the table and index are cached).  For a small 
table I can't imagine the speed difference would even be noticeable.


In my testing, with tables of 100 rows the speed was almost the same 
with an index, a covering index, and no index.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] timestamp check

2015-07-14 Thread Adrian Klaver

On 07/14/2015 07:13 AM, Ramesh T wrote:

i added .MS getting values,But Problem query keep on running but not
displaying results,when i add like limit 5.it  is return
values..

what is the problem  with query..?


As has been explained several times already, subtracting one timestamp 
from another is going to get an interval not a timestamp:


postgres@production=# select current_timestamp - '01/01/2015'::timestamp;
 ?column?
--
 194 days 10:37:33.709606
(1 row)

That cannot be turned into a date:

postgres@production=# select to_char(current_timestamp - 
'01/01/2015'::timestamp, 'DDD HH:MI:SS.MS');

 to_char
--
 194 10:39:06.994
(1 row)

That is not going to change.

So the question remains:

What are you trying to do, get an interval or get a timestamp?




changed date and changed_dttimezone are are parameters..


select to_char((current_timestamp -
TO_TIMESTAMP(to_char(chaged_date,'-MM-DD HH24'|| ' '||'MI'||'
'||'SS')||' '||(SELECT utc_offset  FROM pg_catalog.pg_timezone_names
WHERE name=changed_dttimezone), '-MM-DD HH24'||' '||'MI'||'
'||'SS')::timestamptz),'DDD HH:MI:SS.MS ')





On Tue, Jul 14, 2015 at 4:23 PM, Ramesh T mailto:rameshparnandit...@gmail.com>> wrote:

Yes,But i need to display last digits also

Inline image 1

like 1500 08-09-10.738901

On Mon, Jul 13, 2015 at 8:18 PM, Francisco Olarte
mailto:fola...@peoplecall.com>> wrote:

Hi Ramesh:

On Sun, Jul 12, 2015 at 8:21 AM, Ramesh T
mailto:rameshparnandit...@gmail.com>> wrote:

postgres query
select current_timestamp-
TO_TIMESTAMP(to_char(DATE1, '-MM-DD HH24'|| ' '||'MI'||'
'||'SS')||' '||(SELECT utc_offset  FROM
pg_catalog.pg_timezone_names
WHERE name=DATETIMEZOZE1)   , ''-MM-DD HH24'||'
'||'MI'||' '||'SS')::timestamptz

getting result..

Inline image 1


But in oracle using systimestamp,to_timestamptz and SS TZH
is not supporting  to_timestamp in postgres.


​I do not know about Oracle, but in postgres you are
substracting to timestamps ( current_timestamp -
to_timestamp(whatever) ). This gives you an interval.​



result..

Inline image 2

diffrence is days displaying in postgres query..i thnk
something wrong. is it..?


​Days is displaying in postgres query because it is the default
format to display intervals ( it's a little more complicated,
but related ).

$ select '1500 days 8 hours 9 minutes 10 seconds'::interval;
   interval

  1500 days 08:09:10
(1 row)

If you want a particular format you should use the appropiate
formatting functions, like to_char

$ select to_char('1500 days 8 hours 9 minutes 10
seconds'::interval,'DDD HH-MI-SS');
 to_char
---
  1500 08-09-10
(1 row)

Or, you could try to change the default formatting, but this is
generally incorrect.

Regards.
Francisco Olarte.







--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] timestamp check

2015-07-14 Thread Ramesh T
i added .MS getting values,But Problem query keep on running but not
displaying results,when i add like limit 5.it is return values..

what is the problem  with query..?
changed date and changed_dttimezone are are parameters..


select to_char((current_timestamp -
TO_TIMESTAMP(to_char(chaged_date,'-MM-DD HH24'|| ' '||'MI'||'
'||'SS')||' '||(SELECT utc_offset  FROM pg_catalog.pg_timezone_names
WHERE name=changed_dttimezone), '-MM-DD HH24'||' '||'MI'||'
'||'SS')::timestamptz),'DDD HH:MI:SS.MS')





On Tue, Jul 14, 2015 at 4:23 PM, Ramesh T 
wrote:

> Yes,But i need to display last digits also
>
> [image: Inline image 1]
>
> like 1500 08-09-10.738901
>
> On Mon, Jul 13, 2015 at 8:18 PM, Francisco Olarte 
> wrote:
>
>> Hi Ramesh:
>>
>> On Sun, Jul 12, 2015 at 8:21 AM, Ramesh T 
>> wrote:
>>
>>> postgres query
>>> select current_timestamp-
>>> TO_TIMESTAMP(to_char(DATE1, '-MM-DD HH24'|| ' '||'MI'||' '||'SS')||'
>>> '||(SELECT utc_offset  FROM pg_catalog.pg_timezone_names
>>> WHERE name=DATETIMEZOZE1)   , ''-MM-DD HH24'||' '||'MI'||'
>>> '||'SS')::timestamptz
>>>
>>> getting result..
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> But in oracle using systimestamp,to_timestamptz and SS TZH is not
>>> supporting  to_timestamp in postgres.
>>>
>>
>> ​I do not know about Oracle, but in postgres you are substracting to
>> timestamps ( current_timestamp - to_timestamp(whatever) ). This gives you
>> an interval.​
>>
>>
>>
>>> result..
>>>
>>> [image: Inline image 2]
>>>
>>> diffrence is days displaying in postgres query..i thnk something wrong.
>>> is it..?
>>>
>>
>> ​Days is displaying in postgres query because it is the default format to
>> display intervals ( it's a little more complicated, but related ).
>>
>> $ select '1500 days 8 hours 9 minutes 10 seconds'::interval;
>>   interval
>> 
>>  1500 days 08:09:10
>> (1 row)
>>
>> If you want a particular format you should use the appropiate formatting
>> functions, like to_char
>>
>> $ select to_char('1500 days 8 hours 9 minutes 10 seconds'::interval,'DDD
>> HH-MI-SS');
>> to_char
>> ---
>>  1500 08-09-10
>> (1 row)
>>
>> Or, you could try to change the default formatting, but this is generally
>> incorrect.
>>
>> Regards.
>>Francisco Olarte.
>>
>>
>>
>


Re: [GENERAL] timestamp check

2015-07-14 Thread Ramesh T
Yes,But i need to display last digits also

[image: Inline image 1]

like 1500 08-09-10.738901

On Mon, Jul 13, 2015 at 8:18 PM, Francisco Olarte 
wrote:

> Hi Ramesh:
>
> On Sun, Jul 12, 2015 at 8:21 AM, Ramesh T 
> wrote:
>
>> postgres query
>> select current_timestamp-
>> TO_TIMESTAMP(to_char(DATE1, '-MM-DD HH24'|| ' '||'MI'||' '||'SS')||'
>> '||(SELECT utc_offset  FROM pg_catalog.pg_timezone_names
>> WHERE name=DATETIMEZOZE1)   , ''-MM-DD HH24'||' '||'MI'||'
>> '||'SS')::timestamptz
>>
>> getting result..
>>
>> [image: Inline image 1]
>>
>>
>> But in oracle using systimestamp,to_timestamptz and SS TZH is not
>> supporting  to_timestamp in postgres.
>>
>
> ​I do not know about Oracle, but in postgres you are substracting to
> timestamps ( current_timestamp - to_timestamp(whatever) ). This gives you
> an interval.​
>
>
>
>> result..
>>
>> [image: Inline image 2]
>>
>> diffrence is days displaying in postgres query..i thnk something wrong.
>> is it..?
>>
>
> ​Days is displaying in postgres query because it is the default format to
> display intervals ( it's a little more complicated, but related ).
>
> $ select '1500 days 8 hours 9 minutes 10 seconds'::interval;
>   interval
> 
>  1500 days 08:09:10
> (1 row)
>
> If you want a particular format you should use the appropiate formatting
> functions, like to_char
>
> $ select to_char('1500 days 8 hours 9 minutes 10 seconds'::interval,'DDD
> HH-MI-SS');
> to_char
> ---
>  1500 08-09-10
> (1 row)
>
> Or, you could try to change the default formatting, but this is generally
> incorrect.
>
> Regards.
>Francisco Olarte.
>
>
>


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread Marc Mamin

>On 7/9/2015 12:41 PM, Tom Lane wrote:
>> Andy Colson  writes:
>>> My question is:  Will PG cache only the index (assuming it can always do
>>> an Index Only Scan), or will it cache the table as well?

I'm not sure that indexes on tiny tables are useful.
They raise the options to consider by the query planner, which has its small 
cost too.
I'd be interested on other opinions on this.
Any rule of the thumb with which number of pages per relation it is worth to 
start indexing ?
 
And still another question: I've have tiny static tables too, that never got 
analyzed.
Can this fool the query planner in a negative way ?

regards,

Marc Mamin

>> The table blocks would fall out of cache if they're never touched.
>>
>>regards, tom lane
>>
>>
>
>Sweet!  Thanks Tom.
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Marc Mamin

>Hello,
>
>I am benchmarking different ways of putting data into table on table creation:
>
>1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
>2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
>3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
>parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
>
>(the parallel command is available as part of parallel deb package in Ubuntu 
>for example, it splits the stdin by newline character and feeds it to the 
>corresponding command)
>
>Both tables a and b have ~16M records and one of the columns in a is geometry 
>(ranging from several KB in size to several MB). Columns in b are mostly 
>integers.
>
>The machine that I am running these commands on has the following parameters:
>
>default_statistics_target = 50 # pgtune wizard 2012-06-06 maintenance_work_mem 
>= 1GB # pgtune wizard 2012-06-06 constraint_exclusion = on # pgtune wizard 
>2012-06-06 checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 
>effective_cache_size = 48GB # pgtune wizard 2012-06-06
>work_mem = 80MB # pgtune wizard 2012-06-06
>wal_buffers = 8MB # pgtune wizard 2012-06-06
>checkpoint_segments = 16 # pgtune wizard 2012-06-06
>shared_buffers = 16GB # pgtune wizard 2012-06-06
>max_connections = 400 # pgtune wizard 2012-06-06
>
>One would expect the 3rd option to be faster than 1 and 2, however 2 
>outperforms both by a large margin (sometimes x2). This is especially 
>surprising taking into account that COPY doesn't acquire a global lock on the 
>table, only a RowExclusiveLock

which PG Version ?

I find interesting, that 2 outperforms 1.
The only explanation I can imagine is that "CREATE TABLE AS" freezes the data 
on the fly, as possible with "COPY FROM"
(http://www.postgresql.org/docs/9.4/interactive/sql-copy.html)

You may try parallel insert without using STDIN using modulo. Just start these 
4 queries simultaneously:
INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id
WHERE a.id%4 = [0,1,2,3}

I usually avoid parallel INSERTS to avoid I/O contention and random 
distribution within the target tables.
Are you monitoring the I/O activity in your tests ?
Have you tried to use only 2 parallel processes?

regards,

Marc Mamin


>(according to 
>http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
>
>So is option 2 a winner by design? Could you please suggest other alternatives 
>to try (if there are any)? And what might be the reason that 3 is not 
>outperforming the other 2?
>
>Thank you,
>Igor
>
>
>


Re: [GENERAL] A table of magic constants

2015-07-14 Thread Alvaro Herrera
Dane Foster wrote:
> Hi Michael,
> 
> You nailed it. I am reading the documentation cover to cover. I started
> chapter 9 two weeks ago but haven't found the time to read beyond 9.1 yet.
> But for day to day usage on the MySQL to PostgreSQL migration project that
> I'm working on I jump around in the docs as needed to find specific things
> that are further ahead in the docs than chapter 9.

FWIW if you find things that would benefit from having an entry in the
alphabetical index and are not listed there, by all means report that to
pgsql-docs or this list.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Igor Stassiy
Julien, I would gladly provide more information, I am just not sure what to
add.

I would be willing to leave the server compromised for things like corrupts
or data losses during the time of this import, but the server has to be up
and running before and after the import, if it is successful (so I can't
take it down then change some parameters and start it up with again).

On Tue, Jul 14, 2015 at 6:37 PM Julien Rouhaud 
wrote:

> On 14/07/2015 18:21, Igor Stassiy wrote:
> > Julien, I have the following setting for WAL level: #wal_level = minimal
> > (which defaults to minimal anyway)
> >
>
> Sorry, I sent my mail too early :/
>
> So, option #2 is winner by design. You didn't say anything about your
> needs, so it's hard to help you much more.
>
> If you don't care about losing data on this table if your server
> crashes, you can try option #3 with an unlogged table.
>
>
> > On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud
> > mailto:julien.rouh...@dalibo.com>> wrote:
> >
> > On 14/07/2015 11:12, Igor Stassiy wrote:
> > > Hello,
> > >
> > > I am benchmarking different ways of putting data into table on
> table
> > > creation:
> > >
> > > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id 
> >  = b.id 
> > > ;
> > > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id 
> >  = b.id 
> > > ;
> > > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id 
> >  = b.id 
> > > ) TO STDOUT" |
> > > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> > >
> > > (the parallel command is available as part of parallel deb package
> in
> > > Ubuntu for example, it splits the stdin by newline character and
> feeds
> > > it to the corresponding command)
> > >
> > > Both tables a and b have ~16M records and one of the columns in a
> is
> > > geometry (ranging from several KB in size to several MB). Columns
> in b
> > > are mostly integers.
> > >
> > > The machine that I am running these commands on has the following
> > > parameters:
> > >
> > > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > > constraint_exclusion = on # pgtune wizard 2012-06-06
> > > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > > work_mem = 80MB # pgtune wizard 2012-06-06
> > > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > > max_connections = 400 # pgtune wizard 2012-06-06
> > >
> > > One would expect the 3rd option to be faster than 1 and 2, however
> 2
> > > outperforms both by a large margin (sometimes x2). This is
> especially
> > > surprising taking into account that COPY doesn't acquire a global
> lock
> > > on the table, only a RowExclusiveLock
> > > (according
> > > to
> > http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
> > >
> >
> > What is wal_level value? I think this is because of an optimisation
> > happening with wal_level = minimal:
> >
> > "In minimal level, WAL-logging of some bulk operations can be safely
> > skipped, which can make those operations much faster"
> >
> > see
> >
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
> >
> > > So is option 2 a winner by design? Could you please suggest other
> > > alternatives to try (if there are any)? And what might be the
> reason
> > > that 3 is not outperforming the other 2?
> > >
> > > Thank you,
> > > Igor
> > >
> > >
> >
> >
> > --
> > Julien Rouhaud
> > http://dalibo.com - http://dalibo.org
> >
>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Julien Rouhaud
On 14/07/2015 18:21, Igor Stassiy wrote:
> Julien, I have the following setting for WAL level: #wal_level = minimal
> (which defaults to minimal anyway)
> 

Sorry, I sent my mail too early :/

So, option #2 is winner by design. You didn't say anything about your
needs, so it's hard to help you much more.

If you don't care about losing data on this table if your server
crashes, you can try option #3 with an unlogged table.


> On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud
> mailto:julien.rouh...@dalibo.com>> wrote:
> 
> On 14/07/2015 11:12, Igor Stassiy wrote:
> > Hello,
> >
> > I am benchmarking different ways of putting data into table on table
> > creation:
> >
> > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id 
>  = b.id 
> > ;
> > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id 
>  = b.id 
> > ;
> > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id 
>  = b.id 
> > ) TO STDOUT" |
> > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> >
> > (the parallel command is available as part of parallel deb package in
> > Ubuntu for example, it splits the stdin by newline character and feeds
> > it to the corresponding command)
> >
> > Both tables a and b have ~16M records and one of the columns in a is
> > geometry (ranging from several KB in size to several MB). Columns in b
> > are mostly integers.
> >
> > The machine that I am running these commands on has the following
> > parameters:
> >
> > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > constraint_exclusion = on # pgtune wizard 2012-06-06
> > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > work_mem = 80MB # pgtune wizard 2012-06-06
> > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > max_connections = 400 # pgtune wizard 2012-06-06
> >
> > One would expect the 3rd option to be faster than 1 and 2, however 2
> > outperforms both by a large margin (sometimes x2). This is especially
> > surprising taking into account that COPY doesn't acquire a global lock
> > on the table, only a RowExclusiveLock
> > (according
> > to
> http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
> >
> 
> What is wal_level value? I think this is because of an optimisation
> happening with wal_level = minimal:
> 
> "In minimal level, WAL-logging of some bulk operations can be safely
> skipped, which can make those operations much faster"
> 
> see
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
> 
> > So is option 2 a winner by design? Could you please suggest other
> > alternatives to try (if there are any)? And what might be the reason
> > that 3 is not outperforming the other 2?
> >
> > Thank you,
> > Igor
> >
> >
> 
> 
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Julien Rouhaud
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/07/2015 18:21, Igor Stassiy wrote:
> Julien, I have the following setting for WAL level: #wal_level =
> minimal (which defaults to minimal anyway)
> 
> On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud 
> mailto:julien.rouh...@dalibo.com>>
> wrote:
> 
> On 14/07/2015 11:12, Igor Stassiy wrote:
>> Hello,
>> 
>> I am benchmarking different ways of putting data into table on
>> table creation:
>> 
>> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id 
>  = b.id 
>> ; 2. CREATE TABLE c AS SELECT * FROM a JOIN b on
>> a.id 
>  = b.id 
>> ; 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id
>> 
>  = b.id 
>> ) TO STDOUT" | parallel --block 128M --jobs 4 --pipe
>> psql -c "COPY c FROM STDIN";
>> 
>> (the parallel command is available as part of parallel deb
>> package in Ubuntu for example, it splits the stdin by newline
>> character and feeds it to the corresponding command)
>> 
>> Both tables a and b have ~16M records and one of the columns in a
>> is geometry (ranging from several KB in size to several MB).
>> Columns in b are mostly integers.
>> 
>> The machine that I am running these commands on has the
>> following parameters:
>> 
>> default_statistics_target = 50 # pgtune wizard 2012-06-06 
>> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 
>> constraint_exclusion = on # pgtune wizard 2012-06-06 
>> checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 
>> effective_cache_size = 48GB # pgtune wizard 2012-06-06 work_mem =
>> 80MB # pgtune wizard 2012-06-06 wal_buffers = 8MB # pgtune wizard
>> 2012-06-06 checkpoint_segments = 16 # pgtune wizard 2012-06-06 
>> shared_buffers = 16GB # pgtune wizard 2012-06-06 max_connections
>> = 400 # pgtune wizard 2012-06-06
>> 
>> One would expect the 3rd option to be faster than 1 and 2,
>> however 2 outperforms both by a large margin (sometimes x2). This
>> is especially surprising taking into account that COPY doesn't
>> acquire a global lock on the table, only a RowExclusiveLock 
>> (according to
> http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
>
> 
> 
> 
> What is wal_level value? I think this is because of an
> optimisation happening with wal_level = minimal:
> 
> "In minimal level, WAL-logging of some bulk operations can be
> safely skipped, which can make those operations much faster"
> 
> see 
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>
> 
>> So is option 2 a winner by design? Could you please suggest
>> other alternatives to try (if there are any)? And what might be
>> the reason that 3 is not outperforming the other 2?
>> 
>> Thank you, Igor
>> 
>> 
> 
> 
> -- Julien Rouhaud http://dalibo.com - http://dalibo.org
> 


- -- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJVpTlRAAoJELGaJ8vfEpOqvI4H/RZygc5QXOuEZDWqmWRoZZ5N
kNLWxPJbQ7cLpSNIUj3gJmq9bj0I3K071L09KbJWgxtwvQCzgiTsUIVURv7V83C6
nQ8CmrRr96+jKprx5Gw/uqSel8qnbi9LApl1IDqx9Hnd/HnyVOemND2gzHOQhsKN
tvGuo4ac5yR+rsFA8FHuwXgSgVH2NEDL2n4Zv6jI2uwh5NRBeeGEn8MFKDZCSWN6
HXG9wZaelSrYbcSfumRg07RLnAmP6E/xbY1eB8dA17XmnFxE9AMTFy0YqJb8Kl5Z
KvzQ6+VHnrW2zaoCUOGE56ra2La7TPeJxxeNA9U9Li+8GmvJIQHqIoQvLz7CzT8=
=Ztkl
-END PGP SIGNATURE-


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Igor Stassiy
Julien, I have the following setting for WAL level: #wal_level = minimal
(which defaults to minimal anyway)

On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud 
wrote:

> On 14/07/2015 11:12, Igor Stassiy wrote:
> > Hello,
> >
> > I am benchmarking different ways of putting data into table on table
> > creation:
> >
> > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id  = b.id
> > ;
> > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id  = b.id
> > ;
> > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id  = b.id
> > ) TO STDOUT" |
> > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> >
> > (the parallel command is available as part of parallel deb package in
> > Ubuntu for example, it splits the stdin by newline character and feeds
> > it to the corresponding command)
> >
> > Both tables a and b have ~16M records and one of the columns in a is
> > geometry (ranging from several KB in size to several MB). Columns in b
> > are mostly integers.
> >
> > The machine that I am running these commands on has the following
> > parameters:
> >
> > default_statistics_target = 50 # pgtune wizard 2012-06-06
> > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> > constraint_exclusion = on # pgtune wizard 2012-06-06
> > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> > effective_cache_size = 48GB # pgtune wizard 2012-06-06
> > work_mem = 80MB # pgtune wizard 2012-06-06
> > wal_buffers = 8MB # pgtune wizard 2012-06-06
> > checkpoint_segments = 16 # pgtune wizard 2012-06-06
> > shared_buffers = 16GB # pgtune wizard 2012-06-06
> > max_connections = 400 # pgtune wizard 2012-06-06
> >
> > One would expect the 3rd option to be faster than 1 and 2, however 2
> > outperforms both by a large margin (sometimes x2). This is especially
> > surprising taking into account that COPY doesn't acquire a global lock
> > on the table, only a RowExclusiveLock
> > (according
> > to http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
> >
>
> What is wal_level value? I think this is because of an optimisation
> happening with wal_level = minimal:
>
> "In minimal level, WAL-logging of some bulk operations can be safely
> skipped, which can make those operations much faster"
>
> see http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>
> > So is option 2 a winner by design? Could you please suggest other
> > alternatives to try (if there are any)? And what might be the reason
> > that 3 is not outperforming the other 2?
> >
> > Thank you,
> > Igor
> >
> >
>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Julien Rouhaud
On 14/07/2015 11:12, Igor Stassiy wrote:
> Hello,
> 
> I am benchmarking different ways of putting data into table on table
> creation:
> 
> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id  = b.id
> ;
> 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id  = b.id
> ;
> 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id  = b.id
> ) TO STDOUT" | 
> parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
> 
> (the parallel command is available as part of parallel deb package in
> Ubuntu for example, it splits the stdin by newline character and feeds
> it to the corresponding command)
> 
> Both tables a and b have ~16M records and one of the columns in a is
> geometry (ranging from several KB in size to several MB). Columns in b
> are mostly integers.
> 
> The machine that I am running these commands on has the following
> parameters:
> 
> default_statistics_target = 50 # pgtune wizard 2012-06-06
> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
> constraint_exclusion = on # pgtune wizard 2012-06-06
> checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
> effective_cache_size = 48GB # pgtune wizard 2012-06-06
> work_mem = 80MB # pgtune wizard 2012-06-06 
> wal_buffers = 8MB # pgtune wizard 2012-06-06 
> checkpoint_segments = 16 # pgtune wizard 2012-06-06 
> shared_buffers = 16GB # pgtune wizard 2012-06-06 
> max_connections = 400 # pgtune wizard 2012-06-06
> 
> One would expect the 3rd option to be faster than 1 and 2, however 2
> outperforms both by a large margin (sometimes x2). This is especially
> surprising taking into account that COPY doesn't acquire a global lock
> on the table, only a RowExclusiveLock 
> (according
> to http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
> 

What is wal_level value? I think this is because of an optimisation
happening with wal_level = minimal:

"In minimal level, WAL-logging of some bulk operations can be safely
skipped, which can make those operations much faster"

see http://www.postgresql.org/docs/current/static/runtime-config-wal.html

> So is option 2 a winner by design? Could you please suggest other
> alternatives to try (if there are any)? And what might be the reason
> that 3 is not outperforming the other 2?
> 
> Thank you,
> Igor
> 
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] could not create shared memory segment: Invalid argument

2015-07-14 Thread Andy Colson

On 7/13/2015 7:08 PM, Ryan King - NOAA Affiliate wrote:

Apologies ahead of time for not knowing which group to send to, but I
wanted to see if anyone has encountered and resolved this type of error.
I'm setting up postgresql 9.2 streaming replication on RH and after
copying the master data directory over to the slave, the psql service
refuses start and gives the following errors.



2015-07-13 23:55:41.224 UTC FATAL:  could not create shared memory
segment: Invalid argument
2015-07-13 23:55:41.224 UTC DETAIL:  Failed system call was
shmget(key=5432001, size=1146945536, 03600).
2015-07-13 23:55:41.224 UTC HINT:  This error usually means that
PostgreSQL's request for a shared memory segment exceeded your kernel's
SHMMAX parameter.  You can either reduce the request size or reconfigure
the kernel with larger SHMMAX.  To reduce the request size (currently
1146945536 bytes), reduce PostgreSQL's shared memory usage, perhaps by
reducing shared_buffers or max_connections.
 If the request size is already small, it's possible that it is
less than your kernel's SHMMIN parameter, in which case raising the
request size or reconfiguring SHMMIN is called for.
 The PostgreSQL documentation contains more information about
shared memory configuration.
2015-07-13 23:56:21.344 UTC FATAL:  could not create shared memory
segment: Invalid argument
2015-07-13 23:56:21.344 UTC DETAIL:  Failed system call was
shmget(key=5432001, size=58302464, 03600).
2015-07-13 23:56:21.344 UTC HINT:  This error usually means that
PostgreSQL's request for a shared memory segment exceeded your kernel's
SHMMAX parameter.  You can either reduce the request size or reconfigure
the kernel with larger SHMMAX.  To reduce the request size (currently
58302464 bytes), reduce PostgreSQL's shared memory usage, perhaps by
reducing shared_buffers or max_connections.
 If the request size is already small, it's possible that it is
less than your kernel's SHMMIN parameter, in which case raising the
request size or reconfiguring SHMMIN is called for.
 The PostgreSQL documentation contains more information about
shared memory configuration.



I've set shared_buffer way down to next to nothing along with
kernel.shmmax and kernel.shmall per some blogs. However, the same error
persists, and I'm getting no where. I think ultimately the solution is
to upgrade, but the devs may not be ready for an upgrade at this point.
Any help would be greatly appreciated. Thanks!


You don't want to decrease kernel.shmmax you want to set it to the 
request size:


sysctl -w kernel.shmmax=1146945536

shmmax is the only thing you really need to play with.

-Andy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Where to place suggestions for documentation improvements

2015-07-14 Thread Charles Clavadetscher
Thank you Michael

I will post my suggestions on pgsql-docs to make sure ;-)

Bye
Charles

> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] On Behalf Of Michael Paquier
> Sent: Dienstag, 14. Juli 2015 14:21
> To: Charles Clavadetscher
> Cc: PostgreSQL mailing lists
> Subject: Re: [GENERAL] Where to place suggestions for documentation
> improvements
> 
> On Tue, Jul 14, 2015 at 8:17 PM, Charles Clavadetscher
>  wrote:
> > I have a generic question. Where should I/we place suggestions on
> > possible improvements of the documentation? Is it here or better on
> pgsql-docs?
> 
> pgsql-docs may be a better place than this list dedicated to general
> questions, still most of the people picking up patches from -docs also usually
> have a look here.
> --
> Michael
> 
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Where to place suggestions for documentation improvements

2015-07-14 Thread Michael Paquier
On Tue, Jul 14, 2015 at 8:17 PM, Charles Clavadetscher
 wrote:
> I have a generic question. Where should I/we place suggestions on possible
> improvements of the documentation? Is it here or better on pgsql-docs?

pgsql-docs may be a better place than this list dedicated to general
questions, still most of the people picking up patches from -docs also
usually have a look here.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread Igor Stassiy
Thank you David. I tried to measure the time of COPY (SELECT * FROM a JOIN
b ON a.id = b.id) TO '/tmp/dump.sql' and it took an order of magnitude time
less (~10x) than the complete command (together with INSERT), so conversion
is probably not the main factor of slowdown (unless conversion from text
->internal is significantly slower than that of from internal -> text).

I will also try your suggestion with limiting the ids range.

On Tue, Jul 14, 2015 at 1:42 PM David Rowley 
wrote:

> On 14 July 2015 at 21:12, Igor Stassiy  wrote:
>
>> Hello,
>>
>> I am benchmarking different ways of putting data into table on table
>> creation:
>>
>> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
>> 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
>> 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
>> parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
>>
>> (the parallel command is available as part of parallel deb package in
>> Ubuntu for example, it splits the stdin by newline character and feeds it
>> to the corresponding command)
>>
>> Both tables a and b have ~16M records and one of the columns in a is
>> geometry (ranging from several KB in size to several MB). Columns in b are
>> mostly integers.
>>
>> The machine that I am running these commands on has the following
>> parameters:
>>
>> default_statistics_target = 50 # pgtune wizard 2012-06-06
>> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion
>> = on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune
>> wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06
>> work_mem = 80MB # pgtune wizard 2012-06-06
>> wal_buffers = 8MB # pgtune wizard 2012-06-06
>> checkpoint_segments = 16 # pgtune wizard 2012-06-06
>> shared_buffers = 16GB # pgtune wizard 2012-06-06
>> max_connections = 400 # pgtune wizard 2012-06-06
>>
>> One would expect the 3rd option to be faster than 1 and 2, however 2
>> outperforms both by a large margin (sometimes x2). This is especially
>> surprising taking into account that COPY doesn't acquire a global lock on
>> the table, only a RowExclusiveLock
>> (according to
>> http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
>>
>> So is option 2 a winner by design? Could you please suggest other
>> alternatives to try (if there are any)? And what might be the reason that 3
>> is not outperforming the other 2?
>>
>
> I would imagine that the calling of the output function to translate the
> each value's internal representation to it's user visible/external
> representation plus all the overhead of sending results to the client would
> be a likely candidate of the slow down. In either case 3 would only be as
> fast as the query generating the output. With 1 and 2 all the tuple
> representations of each record stays in the internal format.
>
> If you have some logical way to break the query down into parts, then
> maybe that would be a place to look.
> For example:
>
> INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id WHERE a.id < 800;
> INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id WHERE a.id >= 800;
>
> Of course, you'd need to be very careful to ensure that the results of
> each SELECT never overlap. It would be nice to invent some better way than
> this that divided the workload evenly even when the tables grow.
>
> Then you could run these concurrently.
>
> Regards
>
> David Rowley
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
> 
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>


Re: [GENERAL] Creating table with data from a join

2015-07-14 Thread David Rowley
On 14 July 2015 at 21:12, Igor Stassiy  wrote:

> Hello,
>
> I am benchmarking different ways of putting data into table on table
> creation:
>
> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
> 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
> 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
> parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";
>
> (the parallel command is available as part of parallel deb package in
> Ubuntu for example, it splits the stdin by newline character and feeds it
> to the corresponding command)
>
> Both tables a and b have ~16M records and one of the columns in a is
> geometry (ranging from several KB in size to several MB). Columns in b are
> mostly integers.
>
> The machine that I am running these commands on has the following
> parameters:
>
> default_statistics_target = 50 # pgtune wizard 2012-06-06
> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion
> = on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune
> wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06
> work_mem = 80MB # pgtune wizard 2012-06-06
> wal_buffers = 8MB # pgtune wizard 2012-06-06
> checkpoint_segments = 16 # pgtune wizard 2012-06-06
> shared_buffers = 16GB # pgtune wizard 2012-06-06
> max_connections = 400 # pgtune wizard 2012-06-06
>
> One would expect the 3rd option to be faster than 1 and 2, however 2
> outperforms both by a large margin (sometimes x2). This is especially
> surprising taking into account that COPY doesn't acquire a global lock on
> the table, only a RowExclusiveLock
> (according to
> http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)
>
> So is option 2 a winner by design? Could you please suggest other
> alternatives to try (if there are any)? And what might be the reason that 3
> is not outperforming the other 2?
>

I would imagine that the calling of the output function to translate the
each value's internal representation to it's user visible/external
representation plus all the overhead of sending results to the client would
be a likely candidate of the slow down. In either case 3 would only be as
fast as the query generating the output. With 1 and 2 all the tuple
representations of each record stays in the internal format.

If you have some logical way to break the query down into parts, then maybe
that would be a place to look.
For example:

INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id WHERE a.id < 800;
INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id WHERE a.id >= 800;

Of course, you'd need to be very careful to ensure that the results of each
SELECT never overlap. It would be nice to invent some better way than this
that divided the workload evenly even when the tables grow.

Then you could run these concurrently.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


[GENERAL] Where to place suggestions for documentation improvements

2015-07-14 Thread Charles Clavadetscher
Hello

I have a generic question. Where should I/we place suggestions on possible
improvements of the documentation? Is it here or better on pgsql-docs?

Thanks
Charles




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread Pavel Stehule
2015-07-14 11:59 GMT+02:00 Shujie Shang :

> Does PG has its own data cache? I don't think so.
>

PG has own data cache - see shared_buffers

Regards

Pavel


> I think PG just using the filesystem cache.
>
> On Fri, Jul 10, 2015 at 2:06 AM, Andy Colson  wrote:
>
>> On 7/9/2015 12:41 PM, Tom Lane wrote:
>>
>>> Andy Colson  writes:
>>>
 My question is:  Will PG cache only the index (assuming it can always do
 an Index Only Scan), or will it cache the table as well?

>>>
>>> The table blocks would fall out of cache if they're never touched.
>>>
>>> regards, tom lane
>>>
>>>
>>>
>> Sweet!  Thanks Tom.
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread Shujie Shang
Does PG has its own data cache? I don't think so.
I think PG just using the filesystem cache.

On Fri, Jul 10, 2015 at 2:06 AM, Andy Colson  wrote:

> On 7/9/2015 12:41 PM, Tom Lane wrote:
>
>> Andy Colson  writes:
>>
>>> My question is:  Will PG cache only the index (assuming it can always do
>>> an Index Only Scan), or will it cache the table as well?
>>>
>>
>> The table blocks would fall out of cache if they're never touched.
>>
>> regards, tom lane
>>
>>
>>
> Sweet!  Thanks Tom.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Creating table with data from a join

2015-07-14 Thread Igor Stassiy
Hello,

I am benchmarking different ways of putting data into table on table
creation:

1. INSERT INTO c SELECT * FROM a JOIN b on a.id = b.id;
2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id = b.id;
3. psql -c "COPY (SELECT * FROM a JOIN b on a.id = b.id) TO STDOUT" |
parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN";

(the parallel command is available as part of parallel deb package in
Ubuntu for example, it splits the stdin by newline character and feeds it
to the corresponding command)

Both tables a and b have ~16M records and one of the columns in a is
geometry (ranging from several KB in size to several MB). Columns in b are
mostly integers.

The machine that I am running these commands on has the following
parameters:

default_statistics_target = 50 # pgtune wizard 2012-06-06
maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 constraint_exclusion
= on # pgtune wizard 2012-06-06 checkpoint_completion_target = 0.9 # pgtune
wizard 2012-06-06 effective_cache_size = 48GB # pgtune wizard 2012-06-06
work_mem = 80MB # pgtune wizard 2012-06-06
wal_buffers = 8MB # pgtune wizard 2012-06-06
checkpoint_segments = 16 # pgtune wizard 2012-06-06
shared_buffers = 16GB # pgtune wizard 2012-06-06
max_connections = 400 # pgtune wizard 2012-06-06

One would expect the 3rd option to be faster than 1 and 2, however 2
outperforms both by a large margin (sometimes x2). This is especially
surprising taking into account that COPY doesn't acquire a global lock on
the table, only a RowExclusiveLock
(according to
http://www.postgresql.org/message-id/10611.1014867...@sss.pgh.pa.us)

So is option 2 a winner by design? Could you please suggest other
alternatives to try (if there are any)? And what might be the reason that 3
is not outperforming the other 2?

Thank you,
Igor