Re: [PERFORM] Sort-of replication for reporting purposes

2017-01-13 Thread Ivan Voras
On 13 January 2017 at 12:00, Stuart Bishop  wrote:

>
>
> On 7 January 2017 at 02:33, Ivan Voras  wrote:
>
>>
>>
>>
>> I forgot to add one more information, the databases are 50G+ each so
>> doing the base backup on demand over the network is not a great option.
>>
>
> If you don't want to rebuild your report databases, you can use PostgreSQL
> built in replication to keep them in sync. Just promote the replica to a
> primary, run your reports, then wind it back to a standby and let it catch
> up.
>


Ah, that's a nice option, didn't know about pg_rewind! I need to read about
it some more...
So far, it seems like the best one.



> Personally though, I'd take the opportunity to set up wal shipping and
> point in time recovery on your primary, and rebuild your reporting database
> regularly from these backups. You get your fresh reporting database on
> demand without overloading the primary, and regularly test your backups.
>

I don't think that would solve the main problem. If I set up WAL shipping,
then the secondary server will periodically need to ingest the logs, right?
And then I'm either back to running it for a while and rewinding it, as
you've said, or basically restoring it from scratch every time which will
be slower than just doing a base backup, right?


Re: [PERFORM] Sort-of replication for reporting purposes

2017-01-06 Thread Ivan Voras
On 6 Jan 2017 8:30 p.m., "Scott Marlowe"  wrote:

On Fri, Jan 6, 2017 at 12:24 PM, Ivan Voras  wrote:
> Hello,
>
> I'm investigating options for an environment which has about a dozen
servers
> and several dozen databases on each, and they occasionally need to run
huge
> reports which slow down other services. This is of course "legacy code".
> After some discussion, the idea is to offload these reports to separate
> servers - and that would be fairly straightforward if not for the fact
that
> the report code creates temp tables which are not allowed on read-only hot
> standby replicas.
>
> So, the next best thing would be to fiddle with the storage system and
make
> lightweight snapshots of live database clusters (their storage volumes)
and
> mount them on the reporting servers when needed for the reports. This is a
> bit messy :-)
>
> I'm basically fishing for ideas. Are there any other options available
which
> would offer fast replication-like behaviour ?
>
> If not, what practices would minimise problems with the storage snapshots
> idea? Any filesystem options?

I've always solved this with slony replication, but pg_basebackup
should be pretty good for making sort of up to date slave copies. Just
toss a recovery.conf file and touch whatever failover file the slave
expects etc.


I forgot to add one more information, the databases are 50G+ each so doing
the base backup on demand over the network is not a great option.


[PERFORM] Sort-of replication for reporting purposes

2017-01-06 Thread Ivan Voras
Hello,

I'm investigating options for an environment which has about a dozen
servers and several dozen databases on each, and they occasionally need to
run huge reports which slow down other services. This is of course "legacy
code". After some discussion, the idea is to offload these reports to
separate servers - and that would be fairly straightforward if not for the
fact that the report code creates temp tables which are not allowed on
read-only hot standby replicas.

So, the next best thing would be to fiddle with the storage system and make
lightweight snapshots of live database clusters (their storage volumes) and
mount them on the reporting servers when needed for the reports. This is a
bit messy :-)

I'm basically fishing for ideas. Are there any other options available
which would offer fast replication-like behaviour ?

If not, what practices would minimise problems with the storage snapshots
idea? Any filesystem options?


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Ivan Voras
On 3 October 2016 at 12:05, Simon Riggs  wrote:

> On 3 October 2016 at 10:58, Ivan Voras  wrote:
>
> > I get that, my question was more about why the index scan returned 25 mil
> > rows, when the pages are sequentially filled by timestamps? In my
> > understading of BRIN, it should have returned a small number of pages
> which
> > would have been filtered (and sorted) for the exact data, right?
>
> That could be most simply explained if the distribution of your data
> is not what you think it is.
>


Something doesn't add up.
I've clustered the table, then created a BRIN index, and the number of rows
resulting from the index scan dropped only very slightly.

Hmmm, looking at your original reply about the metadata, and my query, did
you mean something like this:

SELECT id FROM expl_transactions WHERE dateAdded < (now() - INTERVAL '10
MINUTES') ORDER BY dateAdded DESC LIMIT 1

To solve this with a BRIN index, the index records (range pairs?)
themselves would need to be ordered, to be able to perform the "ORDER by
... DESC" operation with the index, and then sort it and take the single
record from this operation, and there is currently no such data being
recorded?


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Ivan Voras
On 3 October 2016 at 11:40, Simon Riggs  wrote:

> On 3 October 2016 at 10:00, Ivan Voras  wrote:
>


> > My first idea is to create a default BRIN index on dateAdded since the
> above
> > query is not run frequently. To my surprise, the planner refused to use
> the
> > index and used sequential scan instead. When I forced sequential scanning
> > off, I got this:
> >
> > https://explain.depesz.com/s/W8oo
> >
> > The query was executing for 40+ seconds. It seems like the "index scan"
> on
> > it returns nearly 9% of the table, 25 mil rows. Since the data in
> dateAdded
> > actually is sequential and fairly selective (having now() as the default
> > over a long period of time), this surprises me.
> >
> > With a normal btree index, of course, it runs fine:
> >
> > https://explain.depesz.com/s/TB5
>
> Btree retains ordering, BRIN does not.
>
> We've discussed optimizing the sort based upon BRIN metadata, but
> that's not implemented yet.
>


I get that, my question was more about why the index scan returned 25 mil
rows, when the pages are sequentially filled by timestamps? In my
understading of BRIN, it should have returned a small number of pages which
would have been filtered (and sorted) for the exact data, right?


[PERFORM] Understanding BRIN index performance

2016-10-03 Thread Ivan Voras
Hi,

I have a table of around 20 G, more than 220 million records, and I'm
running this query on it:

explain analyze SELECT MAX(id) - (SELECT id FROM expl_transactions WHERE
dateAdded < (now() - INTERVAL '10 MINUTES') ORDER BY dateAdded DESC LIMIT
1) FROM expl_transactions;

"id" is SERIAL, "dateAdded" is timestamp without timezone

The "dateAdded" field also has a "default now()" applied to it some time
after its creation, and a fair amount of null values in the records (which
I don't think matters for this query, but maybe I'm wrong).

My first idea is to create a default BRIN index on dateAdded since the
above query is not run frequently. To my surprise, the planner refused to
use the index and used sequential scan instead. When I forced sequential
scanning off, I got this:

https://explain.depesz.com/s/W8oo

The query was executing for 40+ seconds. It seems like the "index scan" on
it returns nearly 9% of the table, 25 mil rows. Since the data in
dateAdded actually is sequential and fairly selective (having now() as the
default over a long period of time), this surprises me.

With a normal btree index, of course, it runs fine:

https://explain.depesz.com/s/TB5


Any ideas?


[PERFORM] Logging queries using sequential scans

2016-08-10 Thread Ivan Voras
Hello,

Is it possible to log queries using sequential scans? Or possibly every
query in a way which allows grepping for those with sequential scans?


Re: [PERFORM] Indexes for hashes

2016-06-17 Thread Ivan Voras
And in any case, there's no crc32 in the built-in pgcrypto module.


On 17 June 2016 at 06:18, Claudio Freire  wrote:

> On Fri, Jun 17, 2016 at 1:09 AM, julyanto SUTANDANG
>  wrote:
> > This way is doing faster using crc32(data) than hashtext since crc32 is
> > hardware accelerated in intel (and others perhaps)
> > this way (crc32)  is no way the same as hash, much way faster than
> others...
> >
> > Regards,
>
> Sure, but I've had uniformity issues with crc32.
>


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi,

Just for testing... is there a fast (i.e. written in C) crc32 or a similar
small hash function for PostgreSQL?


On 15 June 2016 at 16:00, Ivan Voras  wrote:

> Hi,
>
> This idea is similar to the substring one, and while it does give
> excellent performance and small size, it requires application code
> modifications, so it's out.
>
>
> On 15 June 2016 at 15:58, julyanto SUTANDANG 
> wrote:
>
>> Hi Ivan,
>>
>> How about using crc32 ? and then index the integer as the result of crc32
>> function? you can split the hash into 2 part and do crc32 2x ? and then
>> create composite index on both integer (the crc32 result)
>> instead of using 64 char, you only employ 2 integer as index key.
>>
>> Regards,
>>
>> Jul
>>
>> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras  wrote:
>>
>>> Hi,
>>>
>>> I understand your idea, and have also been thinking about it. Basically,
>>> existing applications would need to be modified, however slightly, and that
>>> wouldn't be good.
>>>
>>>
>>>
>>>
>>> On 15 June 2016 at 15:38, hubert depesz lubaczewski 
>>> wrote:
>>>
>>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>>>> > I have an application which stores a large amounts of hex-encoded hash
>>>> > strings (nearly 100 GB of them), which means:
>>>>
>>>> Why do you keep them hex encoded, and not use bytea?
>>>>
>>>> I made a sample table with 1 million rows, looking like this:
>>>>
>>>>  Table "public.new"
>>>>  Column  | Type  | Modifiers
>>>> -+---+---
>>>>  texthex | text  |
>>>>  a_bytea | bytea |
>>>>
>>>> values are like:
>>>>
>>>> $ select * from new limit 10;
>>>>  texthex  |
>>>>   a_bytea
>>>>
>>>> --+
>>>>  c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>>>  61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>>>  757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>>>  fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>>>  ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>>>  11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>>>  5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>>>  2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>>>  2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>>>  2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>>>> (10 rows)
>>>>
>>>> created two indexes:
>>>> create index i1 on new (texthex);
>>>> create index i2 on new (a_bytea);
>>>>
>>>> i1 is 91MB, and i2 is 56MB.
>>>>
>>>> Index creation was also much faster - best out of 3 runs for i1 was
>>>> 4928.982
>>>> ms, best out of 3 runs for i2 was 2047.648 ms
>>>>
>>>> Best regards,
>>>>
>>>> depesz
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> Julyanto SUTANDANG
>>
>> Equnix Business Solutions, PT
>> (An Open Source an Open Mind Company)
>>
>> Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
>> Pusat
>> T: +6221 2282 F: +62216315281 M: +628164858028
>>
>>
>> Caution: The information enclosed in this email (and any attachments) may
>> be legally privileged and/or confidential and is intended only for the use
>> of the addressee(s). No addressee should forward, print, copy, or otherwise
>> reproduce this message in any manner that would allow it to be viewed by
>> any individual not originally listed as a recipient. If the reader of this
>> message is not the intended recipient, you are hereby notified that any
>> unauthorized disclosure, dissemination, distribution, copying or the taking
>> of any action in reliance on the information herein is strictly prohibited.
>> If you have received this communication in error, please immediately notify
>> the sender and delete this message.Unless it is made by the authorized
>> person, any views expressed in this message are those of the individual
>> sender and may not necessarily reflect the views of PT Equnix Business
>> Solutions.
>>
>
>


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi,

This idea is similar to the substring one, and while it does give excellent
performance and small size, it requires application code modifications, so
it's out.


On 15 June 2016 at 15:58, julyanto SUTANDANG  wrote:

> Hi Ivan,
>
> How about using crc32 ? and then index the integer as the result of crc32
> function? you can split the hash into 2 part and do crc32 2x ? and then
> create composite index on both integer (the crc32 result)
> instead of using 64 char, you only employ 2 integer as index key.
>
> Regards,
>
> Jul
>
> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras  wrote:
>
>> Hi,
>>
>> I understand your idea, and have also been thinking about it. Basically,
>> existing applications would need to be modified, however slightly, and that
>> wouldn't be good.
>>
>>
>>
>>
>> On 15 June 2016 at 15:38, hubert depesz lubaczewski 
>> wrote:
>>
>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>>> > I have an application which stores a large amounts of hex-encoded hash
>>> > strings (nearly 100 GB of them), which means:
>>>
>>> Why do you keep them hex encoded, and not use bytea?
>>>
>>> I made a sample table with 1 million rows, looking like this:
>>>
>>>  Table "public.new"
>>>  Column  | Type  | Modifiers
>>> -+---+---
>>>  texthex | text  |
>>>  a_bytea | bytea |
>>>
>>> values are like:
>>>
>>> $ select * from new limit 10;
>>>  texthex  |
>>> a_bytea
>>>
>>> --+
>>>  c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>>  61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>>  757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>>  fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>>  ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>>  11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>>  5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>>  2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>>  2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>>  2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>>> (10 rows)
>>>
>>> created two indexes:
>>> create index i1 on new (texthex);
>>> create index i2 on new (a_bytea);
>>>
>>> i1 is 91MB, and i2 is 56MB.
>>>
>>> Index creation was also much faster - best out of 3 runs for i1 was
>>> 4928.982
>>> ms, best out of 3 runs for i2 was 2047.648 ms
>>>
>>> Best regards,
>>>
>>> depesz
>>>
>>>
>>
>
>
> --
>
>
> Julyanto SUTANDANG
>
> Equnix Business Solutions, PT
> (An Open Source an Open Mind Company)
>
> Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
> Pusat
> T: +6221 2282 F: +62216315281 M: +628164858028
>
>
> Caution: The information enclosed in this email (and any attachments) may
> be legally privileged and/or confidential and is intended only for the use
> of the addressee(s). No addressee should forward, print, copy, or otherwise
> reproduce this message in any manner that would allow it to be viewed by
> any individual not originally listed as a recipient. If the reader of this
> message is not the intended recipient, you are hereby notified that any
> unauthorized disclosure, dissemination, distribution, copying or the taking
> of any action in reliance on the information herein is strictly prohibited.
> If you have received this communication in error, please immediately notify
> the sender and delete this message.Unless it is made by the authorized
> person, any views expressed in this message are those of the individual
> sender and may not necessarily reflect the views of PT Equnix Business
> Solutions.
>


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi,

I understand your idea, and have also been thinking about it. Basically,
existing applications would need to be modified, however slightly, and that
wouldn't be good.




On 15 June 2016 at 15:38, hubert depesz lubaczewski 
wrote:

> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
>
> Why do you keep them hex encoded, and not use bytea?
>
> I made a sample table with 1 million rows, looking like this:
>
>  Table "public.new"
>  Column  | Type  | Modifiers
> -+---+---
>  texthex | text  |
>  a_bytea | bytea |
>
> values are like:
>
> $ select * from new limit 10;
>  texthex  |
>   a_bytea
>
> --+
>  c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>  61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>  757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>  fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>  ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>  11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>  5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>  2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>  2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>  2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
> (10 rows)
>
> created two indexes:
> create index i1 on new (texthex);
> create index i2 on new (a_bytea);
>
> i1 is 91MB, and i2 is 56MB.
>
> Index creation was also much faster - best out of 3 runs for i1 was
> 4928.982
> ms, best out of 3 runs for i2 was 2047.648 ms
>
> Best regards,
>
> depesz
>
>


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
On 15 June 2016 at 15:03, k...@rice.edu  wrote:

> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
> > Hi,
> >
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
> >
> >- The number of distinct characters (alphabet) is limited to 16
> >- Each string is of the same length, 64 characters
> >- The strings are essentially random
> >
> > Creating a B-Tree index on this results in the index size being larger
> than
> > the table itself, and there are disk space constraints.
> >
> > I've found the SP-GIST radix tree index, and thought it could be a good
> > match for the data because of the above constraints. An attempt to create
> > it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes
> more
> > than 12 hours (while a similar B-tree index takes a few hours at most),
> so
> > I've interrupted it because "it probably is not going to finish in a
> > reasonable time". Some slides I found on the spgist index allude that
> both
> > build time and size are not really suitable for this purpose.
> >
> > My question is: what would be the most size-efficient index for this
> > situation?
>
> Hi Ivan,
>
> If the strings are really random, then maybe a function index on the first
> 4, 8, or 16 characters could be used to narrow the search space and not
> need
> to index all 64. If they are not "good" random numbers, you could use a
> hash
> index on the strings. It will be much smaller since it currently uses a
> 32-bit
> hash. It has a number of caveats and is not currently crash-safe, but it
> seems
> like it might work in your environment. You can also use a functional
> index on
> a hash-function applied to your values with a btree to give you crash
> safety.
>
>
Hi,

I figured the hash index might be helpful and I've tried it in the
meantime: on one of the smaller tables (which is 51 GB in size), a btree
index is 32 GB, while the hash index is 22 GB (so btree is around 45%
larger).

I don't suppose there's an effort in progress to make hash indexes use WAL?
:D


[PERFORM] Indexes for hashes

2016-06-15 Thread Ivan Voras
Hi,

I have an application which stores a large amounts of hex-encoded hash
strings (nearly 100 GB of them), which means:

   - The number of distinct characters (alphabet) is limited to 16
   - Each string is of the same length, 64 characters
   - The strings are essentially random

Creating a B-Tree index on this results in the index size being larger than
the table itself, and there are disk space constraints.

I've found the SP-GIST radix tree index, and thought it could be a good
match for the data because of the above constraints. An attempt to create
it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
than 12 hours (while a similar B-tree index takes a few hours at most), so
I've interrupted it because "it probably is not going to finish in a
reasonable time". Some slides I found on the spgist index allude that both
build time and size are not really suitable for this purpose.

My question is: what would be the most size-efficient index for this
situation?


Re: [PERFORM] tsearch2, large data and indexes

2014-04-24 Thread Ivan Voras
On 24 April 2014 13:34, Heikki Linnakangas  wrote:

> As the docs say, the GIN index does not store the weights. As such, there is
> no need to strip them. A recheck would be necessary if your query needs the
> weights, precisely because the weights are not included in the index.
>
> (In the OP's query, it's the ranking that was causing the detoasting.)

Thanks!

My problem is that I actually need the ranking. My queries can return
a large number of documents (tens of thousands) but I usually need
only the first couple of pages of most relevant results (e.g. 50-100
records). With PostgreSQL and tsearch2, this means that the tens of
thousands of documents found via the index are then detoasted and
ranked.

Does anyone have experience with external search engines which also
have ranking but are more efficient? How about Solr?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-23 Thread Ivan Voras
On 22 April 2014 17:58, Jeff Janes  wrote:
> On Tue, Apr 22, 2014 at 12:57 AM, Ivan Voras  wrote:
>>
>> On 22 April 2014 08:40, Heikki Linnakangas 
>> wrote:
>> > On 04/20/2014 02:15 AM, Ivan Voras wrote:
>> >> More details: after thinking about it some more, it might have
>> >> something to do with tsearch2 and indexes: the large data in this case
>> >> is a tsvector, indexed with GIN, and the query plan involves a
>> >> re-check condition.
>
>
> I think bitmap scans always insert a recheck, do to the possibility of
> bitmap overflow.
>
> But that doesn't mean that it ever got triggered.  In 9.4., explain
> (analyze) will report on the overflows.

Ok, I found out what is happening, quoting from the documentation:

"GIN indexes are not lossy for standard queries, but their performance
depends logarithmically on the number of unique words. (However, GIN
indexes store only the words (lexemes) oftsvector values, and not
their weight labels. Thus a table row recheck is needed when using a
query that involves weights.)"

My query doesn't have weights but the tsvector in the table has them -
I take it this is what is meant by "involves weights."

So... there's really no way for tsearch2 to produce results based on
the index alone, without recheck? This is... limiting.

>> Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
>> recheck condition - but there is.
>> This is the query:
>>
>> SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
>> rank, html_filename
>> FROM documents, to_tsquery('document') AS q
>> WHERE fts_data @@ q
>>  ORDER BY rank DESC  LIMIT 25;
>>
>> And here is the explain analyze: http://explain.depesz.com/s/4xm
>> It clearly shows a bitmap index scan operation is immediately followed
>> by a recheck operation AND that the recheck operation actually does
>> something, because it reduces the number of records from 61 to 58
>> (!!!).
>
>
> That could be ordinary visibility checking, not qual rechecking.

Visibility as in transaction-wise? It's not, this was the only client
connected to the dev server, and the only transaction(s) happening.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-22 Thread Ivan Voras
On 22 April 2014 08:40, Heikki Linnakangas  wrote:
> On 04/20/2014 02:15 AM, Ivan Voras wrote:
>> More details: after thinking about it some more, it might have
>> something to do with tsearch2 and indexes: the large data in this case
>> is a tsvector, indexed with GIN, and the query plan involves a
>> re-check condition.
>>
>> The query is of the form:
>> SELECT simple_fields FROM table WHERE fts @@ to_tsquery('...').
>>
>> Does the "re-check condition" mean that the original tsvector data is
>> always read from the table in addition to the index?
>
>
> Yes, if the re-check condition involves the fts column. I don't see why you
> would have a re-check condition with a query like that, though. Are there
> some other WHERE-conditions that you didn't show us?

Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
recheck condition - but there is.
This is the query:

SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
rank, html_filename
FROM documents, to_tsquery('document') AS q
WHERE fts_data @@ q
 ORDER BY rank DESC  LIMIT 25;

And here is the explain analyze: http://explain.depesz.com/s/4xm
It clearly shows a bitmap index scan operation is immediately followed
by a recheck operation AND that the recheck operation actually does
something, because it reduces the number of records from 61 to 58
(!!!).

This is the table structure:

nn=# \d documents
 Table "public.documents"
Column |   Type   |   Modifiers
---+--+
 id| integer  | not null default
nextval('documents_id_seq'::regclass)
 ctime | integer  | not null default unix_ts(now())
 dtime | integer  | not null
 title | text | not null
 html_filename | text | not null
 raw_data  | text |
 fts_data  | tsvector | not null
 tags  | text[]   |
 dtype | integer  | not null default 0
 flags | integer  | not null default 0
Indexes:
"documents_pkey" PRIMARY KEY, btree (id)
"documents_html_filename" UNIQUE, btree (html_filename)
"documents_dtime" btree (dtime)
"documents_fts_data" gin (fts_data)
"documents_tags" gin (tags)


> The large fields are stored in the toast table. You can check if the toast
> table is accessed with a query like this:
>
> select * from pg_stat_all_tables where relid = (select reltoastrelid from
> pg_class where relname='table');
>
> Run that before and after your query, and see if the numbers change.

Before:

relid|schemaname|relname|seq_scan|seq_tup_read|idx_scan|idx_tup_fetch|n_tup_ins|n_tup_upd|n_tup_del|n_tup_hot_upd|n_live_tup|n_dead_tup|last_vacuum|last_autovacuum|last_analyze|last_autoanalyze|vacuum_count|autovacuum_count|analyze_count|autoanalyze_count
27290|pg_toast|pg_toast_27283|3|0|2481289|10631453|993194|0|266306|0|147931|2514||2014-04-18
00:49:11.066443+02|||0|11|0|0

After:

relid|schemaname|relname|seq_scan|seq_tup_read|idx_scan|idx_tup_fetch|n_tup_ins|n_tup_upd|n_tup_del|n_tup_hot_upd|n_live_tup|n_dead_tup|last_vacuum|last_autovacuum|last_analyze|last_autoanalyze|vacuum_count|autovacuum_count|analyze_count|autoanalyze_count
27290|pg_toast|pg_toast_27283|3|0|2481347|10632814|993194|0|266306|0|147931|2514||2014-04-18
00:49:11.066443+02|||0|11|0|0

idx_scan has changed from 2481289 to 2481347 (58)
idx_tup_fetch has changed from 10631453 to 10632814 (1361)

Number 58 corresponds to the number of rows found by the index, seen
in the EXPLAIN output, I don't know where 1361 comes from.

I'm also surprised by the amount of memory used for sorting (23 kB),
since the actually returned data from my query (all the tuples from
all the 58 rows) amount to around 2 kB - but this is not an actual
problem.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] tsearch2, large data and indexes

2014-04-19 Thread Ivan Voras
Hello,

If a table contains simple fields as well as large (hundreds of KiB)
text fields, will accessing only the simple fields cause the entire
record data, including the large fields, to be read and unpacked?
(e.g. SELECT int_field FROM table_with_large_text)

More details: after thinking about it some more, it might have
something to do with tsearch2 and indexes: the large data in this case
is a tsvector, indexed with GIN, and the query plan involves a
re-check condition.

The query is of the form:
SELECT simple_fields FROM table WHERE fts @@ to_tsquery('...').

Does the "re-check condition" mean that the original tsvector data is
always read from the table in addition to the index? That would be
very wasteful since data is practically duplicated in the table and in
the index. Any way around it?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 57 minute SELECT

2013-10-03 Thread Ivan Voras
On 03/10/2013 03:17, Samuel Stearns wrote:
> The last part, the EXPLAIN, is too big to send.  Is there an alternative
> way I can get it too you, other than chopping it up and sending in
> multiple parts?

The usual way is via http://explain.depesz.com/ .




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-06 Thread Ivan Voras
Here are two more unexpected results. Same test table (1 mil. records,
"id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
before the experiments):

ivoras=# explain analyze select * from lt where id > 90 limit 10;
   QUERY PLAN

 Limit  (cost=0.00..1.71 rows=10 width=9) (actual
time=142.669..142.680 rows=10 loops=1)
   ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
(actual time=142.665..142.672 rows=10 loops=1)
 Filter: (id > 90)
 Total runtime: 142.735 ms
(4 rows)

Note the Seq Scan.

ivoras=# explain analyze select * from lt where id > 90;
  QUERY PLAN
---
 Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
(actual time=38.462..85.780 rows=10 loops=1)
   Recheck Cond: (id > 90)
   ->  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
width=0) (actual time=38.310..38.310 rows=10 loops=1)
 Index Cond: (id > 90)
 Total runtime: 115.674 ms
(5 rows)

This somewhat explains the above case - we are simply fetching 100,000
records here, and it's slow enough even with the index scan, so
planner skips the index in the former case. BUT, if it did use the
index, it would have been expectedly fast:

ivoras=# set enable_seqscan to off;
SET
ivoras=# explain analyze select * from lt where id > 90 limit 10;
 QUERY PLAN

 Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
rows=10 loops=1)
   ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
 Index Cond: (id > 90)
 Total runtime: 0.175 ms
(4 rows)

It looks like the problem is in the difference between what the
planner expects and what the Filter or Index operations deliver:
(cost=0.00..17402.00 rows=101630 width=9) (actual
time=142.665..142.672 rows=10 loops=1).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-06 Thread Ivan Voras
On 6 August 2013 02:20, Michael Paquier  wrote:
>
> On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire 
> wrote:
>>
>> On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras  wrote:
>> > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>> >
>> > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10
>>
>> Did you try explain?
>
> And did you run ANALYZE on your table to be sure that you generate correct
> plans?

My question was a theoretical one - about general pgsql abilities, not
a specific case.

But after prodded by you, here's an EXPLAIN ANALYZE for the first case:

ivoras=# explain analyze select * from lt order by id desc limit 10;
  QUERY
PLAN
--
 Limit  (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=100 width=9) (actual time=0.031..0.042 rows=10 loops=1)
 Total runtime: 0.115 ms
(3 rows)

(This test table has only 1 mil. records.)

I'm not sure how to interpret the last line:
(cost=0.00..28673.34 rows=100 width=9) (actual time=0.031..0.042
rows=10 loops=1)

It seems to me like the planner thinks the Index Scan operation will
return the entire table, and expects both a huge cost and all of the
records, but its actual implementation does return only 10 records.
The Limit operation is a NOP in this case. Is this correct?

In the second case (with OFFSET):
ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10;
  QUERY
PLAN
--
 Limit  (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=100 width=9) (actual time=0.040..0.061 rows=20 loops=1)
 Total runtime: 0.136 ms
(3 rows)

It looks like the Index Scan implementation actually returns the first
20 records - which means that for the last "page" of the supposed
paginated display, pgsql will actually internally read the entire
table as an operation result in memory and discard almost all of it in
the Limit operation. Right?

And for the third case (with another column in WITH):
ivoras=# update lt set active = false where (id % 2) = 0;
UPDATE 50
ivoras=# explain analyze select * from lt where active order by id
desc limit 10;
  QUERY
PLAN
--
 Limit  (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..47380.43
rows=150 width=9) (actual time=0.132..0.174 rows=10 loops=1)
 Filter: active
 Total runtime: 0.244 ms
(4 rows)

It looks like filtering is performed in the Index Scan operation -
which I never expected. Is this usually done or only for simple
queries. In other words, is there a circumstance where it is NOT done
in this way, and should I care?

Based on the results with OFFSET it looks like it would be a bad idea
to use pgsql for allowing the user to browse through the records in a
paginated form if the table is huge. I would very much like to be
proven wrong :)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Ivan Voras
Hello,

Assuming I have a huge table (doesn't fit in RAM), of which the most
important fields are "id" which is a SERIAL PRIMARY KEY and "active"
which is a boolean, and I'm issuing a query like:

SELECT * FROM table ORDER BY id DESC LIMIT 10

... is pgsql smart enough to use the index to fetch only the 10
required rows instead of reading the whole table, then sorting it,
then trimming the result set? How about in the following queries:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10

SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Or, more generally, is there some set of circumstances under which the
catastrophic scenario will happen?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Unique values across a table of arrays - documents and tags

2012-11-07 Thread Ivan Voras
On 07/11/2012 16:54, Florent Guillaume wrote:
> Maybe you could store the tags as fulltext words, query them using
> fulltext search, and use ts_stat to gather the list of words? Needs to
> be benched of course.
> You'll probably need to change the config to avoid stemming and stop words.

I have thought of that but decided not to even try because fts will also
use GIN and the combination of fts on a text field + GIN cannot possibly
be faster than just GIN on an array...



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Unique values across a table of arrays - documents and tags

2012-11-07 Thread Ivan Voras
On 07/11/2012 16:34, François Beausoleil wrote:
> Le 2012-11-07 à 10:21, Ivan Voras a écrit :

>> Barring that, what would you suggest for efficiently handing a classic
>> structure like this (meaning documents with tags)?
> 
> Can you structure it as the "classic" many to many pattern:
> 
> documents <-> taggings <-> tags

Yes, that is as you said, a classic solution to a classic problem :)

If needed, this is the way I will do it, but for now I'm asking if
there's something that can be done to avoid creating another table or
two. The reason I'm asking is that I've often found that PostgreSQL can
do much more than I thought it can.





signature.asc
Description: OpenPGP digital signature


[PERFORM] Unique values across a table of arrays - documents and tags

2012-11-07 Thread Ivan Voras
Hello,

I know I need to re-engineer this so it doesn't suck by design, so I'm
wondering if there is some nifty PostgreSQL feature or best practice
which may automagically do the best thing.

I store information about documents which are tagged by string tags. The
structure is very simple:

CREATE TABLE documents (
id SERIAL NOT NULL,
title TEXT NOT NULL,
-- other fields --
tags TEXT[] NOT NULL,
flags INTEGER
);

Currently, I have a GIN index on the tags field, and it works for searching:

edem=> explain analyze select id,title,flags from documents where tags
@> ARRAY['tag'];
  QUERY PLAN

---
 Bitmap Heap Scan on documents  (cost=8.00..12.01 rows=1 width=39)
(actual time=0.067..0.086 rows=9 loops=1)
   Recheck Cond: (tags @> '{tag}'::text[])
   ->  Bitmap Index Scan on documents_tags  (cost=0.00..8.00 rows=1
width=0) (actual time=0.053..0.053 rows=9 loops=1)
 Index Cond: (tags @> '{tag}'::text[])
 Total runtime: 0.135 ms
(5 rows)


The other feature I need is a list of unique tags in all the documents,
e.g.:

edem=> explain analyze select distinct unnest(tags) as tag from documents;
 QUERY PLAN

-
 HashAggregate  (cost=28.54..28.84 rows=24 width=42) (actual
time=0.261..0.307 rows=44 loops=1)
   ->  Seq Scan on documents  (cost=0.00..28.45 rows=36 width=42)
(actual time=0.020..0.157 rows=68 loops=1)
 Total runtime: 0.419 ms
(3 rows)

This is unfortunately slow (because I know the load will increase and
this will be a common operation).

The thing I was planning to do is create a separate table, with only the
unique tags, and possibly an array of documents which have these tags,
which will be maintained with UPDATE and INSERT triggers on the
documents table, but then I remembered that the GIN index itself does
something not unlike this method. Is there a way to make use of this
information to get a list of unique tags?

Barring that, what would you suggest for efficiently handing a classic
structure like this (meaning documents with tags)?



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] wal_sync_method on FreeBSD 9.0 - ZFS

2012-09-21 Thread Ivan Voras
On 14/09/2012 22:19, Sébastien Lorion wrote:
> I am not able to set wal_sync_method to anything but fsync on FreeBSD 9.0
> for a DB created on ZFS (I have not tested on UFS). Is that expected ? Has
> it anything to do with running on EC2 ?

Can you explain what prevents you for setting the wal_sync_method?



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Inserts in 'big' table slowing down the database

2012-09-03 Thread Ivan Voras
On 03/09/2012 13:03, Stefan Keller wrote:
> Hi,
> 
> I'm having performance issues with a simple table containing 'Nodes'
> (points) from OpenStreetMap:
> 
>   CREATE TABLE nodes (
>   id bigint PRIMARY KEY,
>   user_name text NOT NULL,
>   tstamp timestamp without time zone NOT NULL,
>   geom GEOMETRY(POINT, 4326)
>   );
>   CREATE INDEX idx_nodes_geom ON nodes USING gist (geom);
> 
> The number of rows grows steadily and soon reaches one billion
> (1'000'000'000), therefore the bigint id.
> Now, hourly inserts (update and deletes) are slowing down the database
> (PostgreSQL 9.1) constantly.
> Before I'm looking at non-durable settings [1] I'd like to know what
> choices I have to tune it while keeping the database productive:
> cluster index? partition table? use tablespaces? reduce physical block size?

You need to describe in detail what does "slowing down" mean in your
case. Do the disk drives somehow do more operations per transaction?
Does the database use more CPU cycles? Is there swapping? What is the
expected (previous) performance?

At a guess, it is very unlikely that using non-durable settings will
help you here.



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] ZFS vs. UFS

2012-07-24 Thread Ivan Voras
On 24/07/2012 14:51, Laszlo Nagy wrote:
> 
>   Hello,
> 
> Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell
> PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)
> 
>  * ZFS is journaled, and it is more independent of the hardware. So if
>the computer goes wrong, I can move the zfs array to a different server.
>  * UFS is not journaled. Also I have to rely on the RAID card to build
>the RAID array. If there is a hw problem with it, then I won't be
>able to recover the data easily.
> 
> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.

Hi,

I think you might actually get a bit more performance out of ZFS,
depending on your load, server configuration and (more so) the tuning of
ZFS... however UFS is IMO more stable so I use it more often. A hardware
RAID card would be good to have, but you can use soft-RAID the usual way
and not be locked-in by the controller.

You can activate softupdates-journalling on UFS if you really want it,
but I find that regular softupdates is perfectly fine for PostgreSQL,
which has its own journalling.




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Seqscan slowness and stored procedures

2012-06-08 Thread Ivan Voras
On 8 June 2012 11:58, Albe Laurenz  wrote:

> Did you take caching of table data in the buffer cache or the filesystem
> cache into account?  Did you run your tests several times in a row and
> were the actual execution times consistent?

Yes, and yes.

>> Would tweaking enable_seqscan and other planner functions during the
>> CREATE FUNCTION have an effect on the stored plan?
>
> No, but you can use the SET clause of CREATE FUNCTION to change
> enable_seqscan for this function if you know that this is the right
> thing.
> But be aware that things might be different for other function arguments
> or when the table data change, so this is normally considered a bad
> idea.

Ok.

>> Do the functions need to be re-created when the database is fully
>> populated, to adjust their stored plans with regards to new
> selectivity
>> situation on the indexes?
>
> No. Even in PL/pgSQL, where plans are cached, this is only for the
> lifetime of the database session.  The plan is generated when the
> function is called for the first time in a database session.

Thanks for clearing this up for me! I thought SQL functions are also
pre-planned and that the plans are static.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seqscan slowness and stored procedures

2012-05-27 Thread Ivan Voras
On 27 May 2012 05:28, Pavel Stehule  wrote:
> Hello
>
> 2012/5/26 Ivan Voras :
>> Hello,
>>
>> I have a SQL function (which I've pasted below) and while testing its
>> code directly (outside a function), this is the "normal", default plan:
>>
>> http://explain.depesz.com/s/vfP (67 ms)
>>
>> and this is the plain with enable_seqscan turned off:
>>
>> http://explain.depesz.com/s/EFP (27 ms)
>>
>> Disabling seqscan results in almost 2.5x faster execution.
>>
>> However, when this code is wrapped in a function, the execution time is
>> closer to the second case (which is great, I'm not complaining):
>>
>
> see http://archives.postgresql.org/pgsql-general/2009-12/msg01189.php

Hi,

Thank you for your answer, but if you read my post, you'll hopefully
realize my questions are different from that in the linked post, and
are not answered by the post.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Seqscan slowness and stored procedures

2012-05-26 Thread Ivan Voras
Hello,

I have a SQL function (which I've pasted below) and while testing its
code directly (outside a function), this is the "normal", default plan:

http://explain.depesz.com/s/vfP (67 ms)

and this is the plain with enable_seqscan turned off:

http://explain.depesz.com/s/EFP (27 ms)

Disabling seqscan results in almost 2.5x faster execution.

However, when this code is wrapped in a function, the execution time is
closer to the second case (which is great, I'm not complaining):

edem=> explain analyze select * from document_content_top_voted(36);
QUERY PLAN
---
 Function Scan on document_content_top_voted  (cost=0.25..10.25
rows=1000 width=188) (actual time=20.644..20.821 rows=167 loops=1)
 Total runtime: 21.236 ms
(2 rows)

I assume that the difference between the function execution time and the
direct plan with seqscan disabled is due to SQL parsing and planning.

Since the plan is compiled-in for stored procedures, is the planner in
that case already running under the assumption that seqscans must be
disabled (or something to that effect)?

Would tweaking enable_seqscan and other planner functions during the
CREATE FUNCTION have an effect on the stored plan?

Do the functions need to be re-created when the database is fully
populated, to adjust their stored plans with regards to new selectivity
situation on the indexes?



The SQL function is:

-- Retrieves document chunks of a specified document which have the most
votes

DROP FUNCTION IF EXISTS document_content_top_voted(INTEGER);
CREATE OR REPLACE FUNCTION document_content_top_voted(document_id INTEGER)
RETURNS TABLE
(chunk_id INTEGER, seq INTEGER, content TEXT, ctime INTEGER, log
TEXT,
nr_chunk_upvotes INTEGER, nr_chunk_downvotes INTEGER,
nr_seq_changes INTEGER, nr_seq_comments INTEGER,
user_login VARCHAR, user_public_name VARCHAR, user_email VARCHAR)
AS $$
WITH documents_top_chunks AS (
SELECT
(SELECT
chunk_id
FROM
documents_chunks_votes_total
WHERE
documents_id=$1 AND
documents_chunks_votes_total.seq=documents_seqs.seq AND votes=
(SELECT
max(votes)
FROM
documents_chunks_votes_total
WHERE
documents_id=$1 AND
documents_chunks_votes_total.seq=documents_seqs.seq)
ORDER BY
chunk_id DESC
LIMIT 1) AS chunk_id, seq AS doc_seq
FROM
documents_seqs
WHERE
documents_id = $1
ORDER BY seq
) SELECT
chunk_id, doc_seq, content, documents_chunks.ctime,
documents_chunks.log,
COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
documents_chunks_id=chunk_id AND vote=1)::integer, 0) AS nr_chunk_upvotes,
COALESCE((SELECT SUM(vote) FROM documents_chunks_votes WHERE
documents_chunks_id=chunk_id AND vote=-1)::integer, 0) AS
nr_chunk_downvotes,
(SELECT COUNT(*) FROM documents_chunks WHERE documents_id=$1 AND
seq=doc_seq)::integer AS nr_seq_changes,
(SELECT COUNT(*) FROM documents_seq_comments WHERE
documents_seq_comments.documents_id=$1 AND seq=doc_seq)::integer AS
nr_seq_comments,
users.login, users.public_name, users.email
FROM
documents_chunks
JOIN documents_top_chunks ON documents_chunks.id =
documents_top_chunks.chunk_id
JOIN users ON users.id=creator_uid
ORDER BY doc_seq
$$ LANGUAGE SQL;

(comments on improving the efficiency of the SQL code are also appreciated)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parallel (concurrent) inserts?

2012-05-25 Thread Ivan Voras
On 26 May 2012 01:36, Jeff Janes  wrote:
> On Fri, May 25, 2012 at 3:04 PM, Ivan Voras  wrote:
>> Hello,
>>
>> I'm wondering if there is ia document describing which guarantees (if
>> any) PostgreSQL makes about concurrency for various operations? Speaking
>> in general (i.e. IO can handle it, number of CPU cores and client
>> threads is optimal), are fully concurrent operations (independant and
>> non-blocking) possible for:
>
> By "fully concurrent" do you mean that there is no detectable
> sub-linear scaling at all?  I'm pretty sure that no such guarantees
> can be made.

> For concurrent bulk loads, there was severe contention on generating
> WAL between concurrent bulk loaders.  That is greatly improved in the
> upcoming 9.2 release.

I was thinking about major exclusive locks in the code paths which
would block multiple clients operating on unrelated data (the same
questions go for update operations). For example: if the free space
map or whole index trees are exclusively locked, things like that. The
WAL issue you mention seems exactly like what I was asking about, but
are there any others?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Parallel (concurrent) inserts?

2012-05-25 Thread Ivan Voras
Hello,

I'm wondering if there is ia document describing which guarantees (if
any) PostgreSQL makes about concurrency for various operations? Speaking
in general (i.e. IO can handle it, number of CPU cores and client
threads is optimal), are fully concurrent operations (independant and
non-blocking) possible for:

1) An unindexed table?

2) A table with 1+ ordinary (default btree) indexes? (are there
constraints on the structure and number of indexes?)

3) A table with 1+ unique indexes?

4) A table with other objects on it (foreign keys, check constraints, etc.)?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] rough benchmarks, sata vs. ssd

2012-02-13 Thread Ivan Voras
On 13 February 2012 22:49, CSS  wrote:
> For the top-post scanners, I updated the ssd test to include
> changing the zfs recordsize to 8k.

> Well now I did, added the results to
> http://ns.morefoo.com/bench.html and it looks like there's
> certainly an improvement.  That's with the only change from the
> previous test being to copy the postgres data dir, wipe the
> original, set the zfs recordsize to 8K (default is 128K), and then
> copy the data dir back.

This makes sense simply because it reduces the amount of data read
and/or written for non-sequential transactions.

> Things that stand out on first glance:
>
> -at a scaling factor of 10 or greater, there is a much more gentle
>  decline in TPS than with the default zfs recordsize
> -on the raw *disk* IOPS graph, I now see writes peaking at around
>  11K/second compared to 1.5K/second.
> -on the zpool iostat graph, I do not see those huge write peaks,
>  which is a bit confusing

Could be that "iostat" and "zpool iostat" average raw data differently.

> -on both iostat graphs, I see the datapoints look more scattered
>  with the 8K recordsize

As an educated guess, it could be that smaller transaction sizes can
"fit in" (in buffers or controller processing paths) where large
didn't allowing more bursts of performance.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] rough benchmarks, sata vs. ssd

2012-02-03 Thread Ivan Voras

On 31/01/2012 09:07, CSS wrote:

Hello all,

Just wanted to share some results from some very basic benchmarking
runs comparing three disk configurations on the same hardware:

http://morefoo.com/bench.html


That's great!


*Tyan B7016 mainboard w/onboard LSI SAS controller
*2x4 core xeon E5506 (2.13GHz)
*64GB ECC RAM (8GBx8 ECC, 1033MHz)
*2x250GB Seagate SATA 7200.9 (ST3250824AS) drives (yes, old and slow)
*2x160GB Intel 320 SSD drives


It shows that you can have large cheap SATA drives and small fast SSD-s, 
and up to a point have best of both worlds. Could you send me 
(privately) a tgz of the results (i.e. the pages+images from the above 
URL), I'd like to host them somewhere more permanently.



The ZIL is a bit of a cheat, as it allows you to throw all the
synchronous writes to the SSD


This is one of the main reasons it was made. It's not a cheat, it's by 
design.



Why ZFS?  Well, we adopted it pretty early for other tasks and it
makes a number of tasks easy.  It's been stable for us for the most
part and our latest wave of boxes all use cheap SATA disks, which
gives us two things - a ton of cheap space (in 1U) for snapshots and
all the other space-consuming toys ZFS gives us, and on this cheaper
disk type, a guarantee that we're not dealing with silent data
corruption (these are probably the normal fanboy talking points).
ZFS snapshots are also a big time-saver when benchmarking.  For our
own application testing I load the data once, shut down postgres,
snapshot pgsql + the app homedir and start postgres.  After each run
that changes on-disk data, I simply rollback the snapshot.


Did you tune ZFS block size for the postgresql data directory (you'll 
need to re-create the file system to do this)? When I investigated it in 
the past, it really did help performance.



I don't have any real questions for the list, but I'd love to get
some feedback, especially on the ZIL results.  The ZIL results
interest me because I have not settled on what sort of box we'll be
using as a replication slave for this one - I was going to either go
the somewhat risky route of another all-SSD box or looking at just
how cheap I can go with lots of 2.5" SAS drives in a 2U.


You probably know the answer to that: if you need lots of storage, 
you'll probably be better off using large SATA drives with small SSDs 
for the ZIL.  160 GB is probably more than you need for ZIL.


One thing I never tried is mirroring a SATA drive and a SSD (only makes 
sense if you don't trust SSDs to be reliable yet) - I don't know if ZFS 
would recognize the assymetry and direct most of the read requests to 
the SSD.



If you have any test requests that can be quickly run on the above
hardware, let me know.


Blogbench (benchmarks/blogbench) results are always nice to see in a 
comparison.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL perform poorly on VMware ESXi

2011-11-07 Thread Ivan Voras
On 07/11/2011 11:36, Lucas Mocellin wrote:
> Hi everybody,
> 
> I'm having some issues with PostgreSQL 9.03 running on FreeBSD 8.2 on top
> of VMware ESXi 4.1 U1.

I hope your hardware is Nehalem-based or newer...

> The problem is query are taking too long, and some times one query "blocks"
> everybody else to use the DB as well.

Ok, so multiple users connect to this one database, right?

> I'm a network administrator, not a DBA, so many things here can be "newbie"
> for you guys, so please be patient. :)

First, did you configure the server and PostgreSQL at all?

For FreeBSD, you'll probably need this in sysctl.conf:

vfs.hirunningspace=8388608
vfs.lorunningspace=6291456
vfs.read_max=128

and for PostgreSQL, read these:

http://www.revsys.com/writings/postgresql-performance.html
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

> I always think the bottleneck is disk I/O as I can see from the vSphere
> performance view, but the virtual machine is using exclusively the SATA
> disk with no concurrency with other machines.

I don't see why concurrency with other machines is relevant. Are you
complaining that multiple users accessing a single database are blocked
while one large query is executing or that this one query blocks other VMs?

If the single query generates a larger amount of IO than your VM host
can handle then you're probably out of luck. Virtualization is always
bad for IO. You might try increasing the amount of memory for the
virtual machine in the hope that more data will be cached.

> how do you guys deal with virtualization? any tips/recommendations? does
> that make sense the disk I/O? any other sugestion?

Are you sure it's IO? Run "iostat 1" for 10 seconds while the query is
executing and post the results.




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-11 Thread Ivan Voras
On 11/10/2011 00:02, Samuel Gendler wrote:

> The original question doesn't actually say that performance has gone down,
> only that cpu utilization has gone up. Presumably, with lots more RAM, it is
> blocking on I/O a lot less, so it isn't necessarily surprising that CPU
> utilization has gone up.  

It's Linux - it counts IO wait in the load average. Again there are too
little details to be sure of anything, but it is possible that the IO
rate didn't go down.




signature.asc
Description: OpenPGP digital signature


[PERFORM] Re: PostgreSQL-related topics of theses and seminary works sought (Was: Hash index use presently(?) discouraged...)

2011-09-19 Thread Ivan Voras
On 17/09/2011 22:01, Stefan Keller wrote:
> 2011/9/17 Tomas Vondra  wrote:
> (...)
>> We've been asked by a local university for PostgreSQL-related topics of
>> theses and seminary works
> 
> I'm also interested in such proposals or ideas!
> 
> Here's some list of topics:
> * Adding WAL-support to hash indexes in PostgreSQL (see ex-topic)
> * Time in PostgreSQL
> * Storing (Weather) Sensor Data in PostgreSQL
> * Fast Bulk Data Inserting in PostgreSQL with Unlogged tables (incl.
> adding GiST support)
> * Performance Tuning of Read-Only a PostgreSQL Database
> * Materialized Views in PostgreSQL: Experiments around Jonathan
> Gardner's Proposal
> * more... ?

 * Covering indexes
 * Controllable record compression
 * Memory tables



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

2011-07-12 Thread Ivan Voras

On 12/07/2011 02:09, lars wrote:


Oh, and iowait hovers around 20% when SELECTs are slow:

avg-cpu: %user %nice %system %iowait %steal %idle
1.54 0.00 0.98 18.49 0.07 78.92

When SELECTs are fast it looks like this:
avg-cpu: %user %nice %system %iowait %steal %idle
8.72 0.00 0.26 0.00 0.00 91.01

Note that this is a 12 core VM. So one core at 100% would show as 8.33%
CPU.


Now only if you could do an "iostat -x" and show the output in both cases...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

2011-07-12 Thread Ivan Voras

On 12/07/2011 16:18, Merlin Moncure wrote:

On Tue, Jul 12, 2011 at 8:22 AM, Ivan Voras  wrote:

On 08/07/2011 01:56, lars wrote:


Setup:
PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux
instance (kernel 2.6.35) with the database and
WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1)
- yes that is not an ideal setup
(WAL should be on separate drive, EBS is slow to begin, etc), but I am
mostly interested in read performance for a fully cached database.


I know you said you know these things - but do you really know the (huge)
extent to which all your IO is slowed? Even context switches in a
virtualized environment are slowed down by a huge margin - which would make
practically all in-memory lock operations very slow - much slower than they
would be on "real" hardware, and EBS by definition is even slower then
regular private virtual storage environments. I regrettably didn't bookmark
the page which did exact measurements of EBS, but
http://www.google.com/search?q=how+slow+is+ec2 will illustrate my point. (of
course, you may already know all this :) ).


sure, but the OP's question is valid: in postgres, readers don't block
writers, so why is the reader waiting?


Yes, but I'm suggesting a different question: are we sure we are not 
seeing the influences of the environment (EC2+EBS) instead of the 
software system?



We need some more information here.


Definitely.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database

2011-07-12 Thread Ivan Voras

On 08/07/2011 01:56, lars wrote:


Setup:
PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux
instance (kernel 2.6.35) with the database and
WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1)
- yes that is not an ideal setup
(WAL should be on separate drive, EBS is slow to begin, etc), but I am
mostly interested in read performance for a fully cached database.


I know you said you know these things - but do you really know the 
(huge) extent to which all your IO is slowed? Even context switches in a 
virtualized environment are slowed down by a huge margin - which would 
make practically all in-memory lock operations very slow - much slower 
than they would be on "real" hardware, and EBS by definition is even 
slower then regular private virtual storage environments. I regrettably 
didn't bookmark the page which did exact measurements of EBS, but 
http://www.google.com/search?q=how+slow+is+ec2 will illustrate my point. 
(of course, you may already know all this :) ).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance block size.

2011-05-27 Thread Ivan Voras

On 27/05/2011 00:34, Tory M Blue wrote:

Working on some optimization as well as finally getting off my
backside and moving us to 64bit (32gb+memory).

I was reading and at some point it appears on freeBSD  the Postgres
block size was upped to 16kb, from 8kb. And on my fedora systems I
believe the default build is 8kb.


This happened some years ago but it was quickly reverted.

There were some apparent gains with the larger block size, especially 
since FreeBSD's UFS uses 16 KiB blocks by default (going to 32 KiB 
really soon now; note that this has no impact on small files because of 
the concept of "block fragments"), but it was concluded that the stock 
installation should follow the defaults set by the developers, not 
porters. YMMV.



Trying to gain some performance and wondered if any of this tuning
even is something I should bother with.


Please do some benchmarks and report results.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FUSION-IO io cards

2011-04-29 Thread Ivan Voras

On 29/04/2011 16:24, Mark Steben wrote:

Hi,
Had a recent conversation with a tech from this company called FUSION-IO.
They sell
  io cards designed to replace conventional disks.  The cards can be up to 3
TB in size and apparently
are installed in closer proximity to the CPU than the disks are.  They claim
performance boosts several times better than the spinning disks.

Just wondering if anyone has had any experience with this company and these
cards.  We're currently at postgres 8.3.11.

Any insights / recommendations appreciated.  thank you,


They are actually very fast SSDs; the fact that they come in "card" 
format and not in conventional "box with plugs" format is better since 
they have less communication overhead and electrical interference.


As far as I've heard, the hardware is as good as they say it is.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background fsck

2011-04-08 Thread Ivan Voras

On 08/04/2011 07:55, Ireneusz Pluta wrote:

Achilleas Mantzios wrote:


In anyway, having FreeBSD to fsck, (background or not) should not
happen. And the problem
becomes bigger when cheap SATA drives will cheat about their write
cache being flushed to the disk.
So in the common case with cheap hardware, it is wise to have a UPS
connected and being monitored
by the system.



It's not lack of UPS. Power issues are taken care of here. It's a buggy
3ware controller which hangs the machine ocassionally and the only way
to have it come back is to power cycle, hard reset is not enough.


So just to summarize your position: you think it is ok to run a database 
on a buggy disk controller but are afraid of problems with normal OS 
utilities? :) You are of course free to do whatever you want but it 
seems odd :)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background fsck

2011-04-07 Thread Ivan Voras

On 07/04/2011 00:48, Scott Marlowe wrote:

On Wed, Apr 6, 2011 at 4:33 PM, Ireneusz Pluta  wrote:

Hello,

I saw some recommendations from people on the net not to use background fsck
when running PostgreSQL on FreeBSD. As I recall, these opinions were just
thoughts of people which they shared with the community, following their bad
experience caused by using background fsck. So, not coming any deeper with
underatanding why not, I use that as a clear recommendation for myself and
keep background fsck turned off on all my machines, regardless how much
faster a server could come up after a crash.

But waiting so much time (like now) during foreground fsck of a large data
filesystem after unclean shutdown, makes me to come to this group to ask
whether I really need to avoid background fsck on a PostgreSQL machine?
Could I hear your opinions?


AFAIK, the reason why background fsck has been discouraged when used 
with databases is because it uses disk bandwidth which may be needed by 
the application. If you are not IO saturated, then there is no 
particular reason why you should avoid it.



Shouldn't a journaling file system just come back up almost immediately?


It's a tradeoff; UFS does not use journalling (at least not likely in 
the version the OP is talking about) but it uses "soft updates". It 
brings most of the benefits of journalling, including "instant up" after 
a crash without the double-write overheads (for comparison, see some 
PostgreSQL benchmarks showing that unjournaled ext2 can be faster than 
journaled ext3, since PostgreSQL has its own form of journaling - the 
WAL). The downside is that fsck needs to be run occasionally to cleanup 
non-critical dangling references on the file system - thus the 
"background fsck" mode in FreeBSD.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

2011-02-15 Thread Ivan Voras

On 15/02/2011 18:19, Thomas Pöhler wrote:

Hi list,

first time for me here, hope you’re not dealing too severely with me
regarding guidelines. Giving my best.

We are running PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by
GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit on a Supermicro SuperServer
8026B-6RF.

This version is downloaded from postgresql.org and selfcompiled, running
for over a year now. The Server has 128 GB RAM and Four Intel® Xeon®
X7550 with 64 logical cores.


So, 64 logical cores total.


Operating System is “Linux database1 2.6.32-bpo.5-amd64 #1 SMP Mon Dec
13 17:10:39 UTC 2010 x86_64 GNU/Linux”.

The System boots through iscsi over a Qlogic QLE4062C HBA. Pgdata and
xlog is logged in over iscsi HBA too. We tried en and disabling jumbo
frames. Makes no difference.


Are you using 10 Gbit/s Ethernet for iSCSI? Regular 1 Gbit/s Ethernet 
might be too slow for you.



Since a few weeks we have really strange peaks on this system. User CPU
is increasing up to 100% and we have lots of SELECTs running.



See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg

Has someone made similar experiences? Perhaps there is some issue
between Postgres 8.4.4 and kernel 2.6.32?


From your graph it looks like the number of active processes (I'm 
assuming they are PostgreSQL processes) is going out of control.


There is an old problem (which I've encountered so I'm replying but it 
may or may not be in your case) in which PostgreSQL starts behaving 
badly even for SELECT queries if the number of simultaneous queries 
exceeds the number of logical CPUs. To test this, I'd recommend setting 
up a utility like pgpool-II (http://pgpool.projects.postgresql.org/) in 
front of the database to try and limit the number of active connections 
to nearly 64 (maybe you can have good results with 80 or 100).


You might also experiment with pgsql.max_links setting of PHP but IIRC 
PHP will just refuse more connections than that instead of waiting for 
them (but maybe your application can spin-wait for them, possibly while 
also using usleep()).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query performance with disabled hashjoin and mergejoin

2011-02-06 Thread Ivan Voras
Sorry for the misunderstaning: of course not default "normal" settings; shared 
buffers, work mem, wal segments and others have been tuned according to 
available hardware (e.g. 4 GB, 32 MB, 10 for these settings, respectively). I 
meant "planner default settings" in the post.
-- 
Sent from my Android phone, please excuse my brevity.

Greg Smith  wrote:

Ivan Voras wrote: > The "vanilla" plan, with default settings is: Pause here 
for a second: why default settings? A default PostgreSQL configuration is 
suitable for systems with about 128MB of RAM. Since you say you have "good 
enough hardware", I'm assuming you have a bit more than that. The first things 
to try here are the list at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ; your bad query 
here looks like it might benefit from a large increase to effective_cache_size, 
and possibly an increase to work_mem as well. Your "bad" plan here is doing a 
lot of sequential scans instead of indexed lookups, which makes me wonder if 
the change in join types you're forcing isn't fixing that part as a 
coincidence. Note that the estimated number of rows coming out of each form of 
plan is off by a factor of about 200X, so it's not that the other plan type is 
better estimating anything. -- Greg Smith 2ndQuadrant US g...@2ndquadrant.com 
Baltimore, MD PostgreSQL Training, Serv
 ices,
and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": 
http://www.2ndQuadrant.com/books 



Re: [PERFORM] Query performance with disabled hashjoin and mergejoin

2011-02-04 Thread Ivan Voras

On 04/02/2011 15:44, Greg Smith wrote:

Ivan Voras wrote:

The "vanilla" plan, with default settings is:


Pause here for a second: why default settings? A default PostgreSQL
configuration is suitable for systems with about 128MB of RAM. Since you
say you have "good enough hardware", I'm assuming you have a bit more
than that. The first things to try here are the list at
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ; your bad
query here looks like it might benefit from a large increase to
effective_cache_size, and possibly an increase to work_mem as well. Your
"bad" plan here is doing a lot of sequential scans instead of indexed
lookups, which makes me wonder if the change in join types you're
forcing isn't fixing that part as a coincidence.


My earlier message didn't get through so here's a repeat:

Sorry for the confusion, by "default settings" I meant "planner default 
settings" not generic shared buffers, wal logs, work memory etc. - which 
are adequately tuned.



Note that the estimated number of rows coming out of each form of plan
is off by a factor of about 200X, so it's not that the other plan type
is better estimating anything.


Any ideas how to fix the estimates? Or will I have to simulate hints by 
issuing "set enable_hashjoin=f; set enable_mergejoin=f;" for this query? :)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query performance with disabled hashjoin and mergejoin

2011-02-04 Thread Ivan Voras
I'm running all this on a 9.0 server with good enough hardware. The 
query is:


SELECT  news.id AS news_id
,   news.layout_id
,   news.news_relation_id
,   news.author_id
,   news.date_created
,   news.date_published
,   news.lastedit
,   news.lastedit_user_id
,   news.lastedit_date
,   news.approved_by
,   news.state
,   news.visible_from
,   news.visible_to
,   news.archived_by
,   news.archived_date
,   news.priority
,   news.collection_id
,   news.comment
,   news.keywords
,   news.icon
,   news.icon_url
,   news.icon_width
,   news.icon_height
,   news.icon_position
,   news.icon_onclick
,   news.icon_newwindow
,   news.no_lead
,   news.content_exists
, news.title, news.lead, news.content


,   author.public_name AS 
author_public_name
,   lastedit_user.public_name AS 
lastedit_user_public_name
,   approved_by_user.public_name AS 
approved_by_public_name
,   archived_by_user.public_name AS 
archived_by_public_name

FROM news
JOIN users AS author ON news.author_id 
= author.id
LEFT JOIN users AS lastedit_user ON 
news.lastedit_user_id = lastedit_user.id
LEFT JOIN users AS approved_by_user ON 
news.approved_by = approved_by_user.id
LEFT JOIN users AS archived_by_user ON 
news.archived_by = archived_by_user.id


WHERE (news.layout_id = 8980) AND (state = 
2) AND (date_published <= 1296806570 AND (visible_from IS NULL OR 
1296806570 BETWEEN visible_f

rom AND visible_to))
ORDER BY priority DESC, date_published DESC
;

The "vanilla" plan, with default settings is:

 Sort  (cost=7325.84..7329.39 rows=1422 width=678) (actual 
time=100.846..100.852 rows=7 loops=1)

   Sort Key: news.priority, news.date_published
   Sort Method:  quicksort  Memory: 38kB
   ->  Hash Left Join  (cost=2908.02..7251.37 rows=1422 width=678) 
(actual time=100.695..100.799 rows=7 loops=1)

 Hash Cond: (news.archived_by = archived_by_user.id)
 ->  Hash Left Join  (cost=2501.75..6819.47 rows=1422 
width=667) (actual time=76.742..76.830 rows=7 loops=1)

   Hash Cond: (news.approved_by = approved_by_user.id)
   ->  Hash Left Join  (cost=2095.48..6377.69 rows=1422 
width=656) (actual time=53.248..53.318 rows=7 loops=1)

 Hash Cond: (news.lastedit_user_id = lastedit_user.id)
 ->  Hash Join  (cost=1689.21..5935.87 rows=1422 
width=645) (actual time=29.793..29.846 rows=7 loops=1)

   Hash Cond: (news.author_id = author.id)
   ->  Bitmap Heap Scan on news 
(cost=1282.94..5494.05 rows=1422 width=634) (actual time=5.532..5.560 
rows=7 loops=1)
 Recheck Cond: ((layout_id = 8980) AND 
(state = 2) AND ((visible_from IS NULL) OR (1296806570 <= visible_to)))
 Filter: ((date_published <= 
1296806570) AND ((visible_from IS NULL) OR ((1296806570 >= visible_from) 
AND (1296806570 <= visible_to
 ->  BitmapAnd  (cost=1282.94..1282.94 
rows=1430 width=0) (actual time=5.508..5.508 rows=0 loops=1)
   ->  Bitmap Index Scan on 
news_index_layout_id_state  (cost=0.00..150.14 rows=2587 width=0) 
(actual time=0.909..0.909 rows=3464 loops=1)
 Index Cond: ((layout_id = 
8980) AND (state = 2))
   ->  BitmapOr 
(cost=1132.20..1132.20 rows=20127 width=0) (actual time=4.136..4.136 
rows=0 loops=1)
 ->  Bitmap Index Scan on 
news_visible_from  (cost=0.00..1122.09 rows=19976 width=0) (actual 
time=3.367..3.367 rows=19932 loops=1)
  

Re: [PERFORM] High load,

2011-01-28 Thread Ivan Voras

On 27/01/2011 11:31, Michael Kohl wrote:

Hi all,

we are running a fairly big Ruby on Rails application on Postgres 8.4.
Our traffic grew quite a bit lately, and since then we are facing DB
performance issues. System load occasionally explodes (around 170
yesterday on a 16 core system), which seems to be caused by disk I/O
(iowait in our Munin graphs goes up significantly during these
periods). At other times the laod stays rather low under pretty much
the same circumstances.


Is there any way you can moderate the number of total active connections 
to the database to approximately match the number of (logical) CPU cores 
on your system? I.e. some kind of connection pool or connection 
limiting? This should help you in more ways than one (limit PG lock 
contention, limit parallel disk IO).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Queries becoming slow under heavy load

2011-01-26 Thread Ivan Voras

On 25/01/2011 22:37, Anne Rosset wrote:

Hi,

We are running some performances tests.  With a lot of concurrent
access,  queries get very slow. When there is no load, those queries run
fast.


As others said, you need to stat how many concurrent clients are working 
on the database and also the number of logical CPUs (CPU cores, 
hyperthreading) are present in the machine. So far, as a rule of thumb, 
if you have more concurrent active connections (i.e. doing queries, not 
idling), you will experience a sharp decline in performance if this 
number exceeds the number of logical CPUs in the machine.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fun little performance IMPROVEMENT...

2011-01-25 Thread Ivan Voras

On 21/01/2011 19:12, gr...@amadensor.com wrote:

I was doing a little testing to see how machine load affected the
performance of different types of queries, index range scans, hash joins,
full scans, a mix, etc.

In order to do this, I isolated different performance hits, spinning only
CPU, loading the disk to create high I/O wait states, and using most of
the physical memory.   This was on a 4 CPU Xen virtual machine running
8.1.22 on CENTOS.


Here is the fun part.   When running 8 threads spinning calculating square
roots (using the stress package), the full scan returned consistently 60%
faster than the machine with no load.   It was returning 44,000 out of
5,000,000 rows.   Here is the explain analyze.   I am hoping that this
triggers something (I can run more tests as needed) that can help us make
it always better.


Looks like a virtualization artifact. Here's a list of some such noticed 
artifacts:


http://wiki.freebsd.org/WhyNotBenchmarkUnderVMWare



Idling:
  QUERY PLAN

  Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
(actual time=0.053..2915.966 rows=44320 loops=1)
Filter: (schedule_type = '5X'::bpchar)
  Total runtime: 2986.764 ms

Loaded:
  QUERY PLAN

  Seq Scan on schedule_details  (cost=0.00..219437.90 rows=81386 width=187)
(actual time=0.034..1698.068 rows=44320 loops=1)
Filter: (schedule_type = '5X'::bpchar)
  Total runtime: 1733.084 ms


In this case it looks like the IO generated by the VM is causing the 
Hypervisor to frequently "sleep" the machine while waiting for the IO, 
but if the machine is also generating CPU load, it is not put to sleep 
as often.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.0 x64 bit pgbench TPC very low question?

2010-12-18 Thread Ivan Voras

On 12/18/10 20:42, tuanhoanganh wrote:

Here is my result without -C
pgbench -h 127.0.0.1 -p  -U postgres -c 100 -t 10 -s 10 pgbench


You really should replace "-t 10" with something like "-T 60" or more.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
2010/12/7 Віталій Тимчишин :
>
>
> 2010/12/7 Robert Haas 
>>
>> On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras  wrote:
>>
>> > I'm not very familiar with PostgreSQL code but if we're
>> > brainstorming... if you're only trying to protect against a small
>> > number of expensive operations (like DROP, etc.) that don't really
>> > happen often, wouldn't an atomic reference counter be good enough for
>> > the purpose (e.g. the expensive operations would spin-wait until the
>> > counter is 0)?
>>
>> No, because (1) busy-waiting is only suitable for locks that will only
>> be held for a short time, and an AccessShareLock on a table might be
>> held while we read 10GB of data in from disk, and (2) that wouldn't
>> allow for deadlock detection.

> What can be done, is that number of locks can be increased - one could use
> spin locks for hash table manipulations, e.g. a lock preventing rehashing
> (number of baskets being changed) and a lock for required basket.
> In this case only small range of code can be protected by partition lock.
> As for me, this will make locking process more cpu-intensive (more locks
> will be acquired/freed during the exection), but will decrease contention
> (since all but one lock can be spin locks working on atomic counters, hash
> searches can be done in parallel), won't it?

For what it's worth, this is pretty much the opposite of what I had in
mind. I proposed atomic reference counters (as others pointed, this
probably won't work) as poor-man's shared-exclusive locks, so that
most operations would not have to contend on them.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
On 7 December 2010 19:10, Robert Haas  wrote:

>> I'm not very familiar with PostgreSQL code but if we're
>> brainstorming... if you're only trying to protect against a small
>> number of expensive operations (like DROP, etc.) that don't really
>> happen often, wouldn't an atomic reference counter be good enough for
>> the purpose (e.g. the expensive operations would spin-wait until the
>> counter is 0)?
>
> No, because (1) busy-waiting is only suitable for locks that will only
> be held for a short time, and an AccessShareLock on a table might be
> held while we read 10GB of data in from disk,

Generally yes, but a variant with adaptive sleeping could possibly be
used if it would be acceptable to delay (uncertainly) the already
expensive and rare operations.

> and (2) that wouldn't
> allow for deadlock detection.

Probably :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
On 7 December 2010 18:37, Robert Haas  wrote:
> On Mon, Dec 6, 2010 at 9:59 PM, Jignesh Shah  wrote:
>> That's exactly what I concluded when I was doing the sysbench simple
>> read-only test. I had also tried with different lock partitions and it
>> did not help since they all go after the same table. I think one way
>> to kind of avoid the problem on the same table is to do more granular
>> locking (Maybe at page level instead of table level). But then I dont
>> really understand on how to even create a prototype related to this
>> one. If you can help create a prototype then I can test it out with my
>> setup and see if it helps us to catch up with other guys out there.
>
> We're trying to lock the table against a concurrent DROP or schema
> change, so locking only part of it doesn't really work.  I don't
> really see any way to avoid needing some kind of a lock here; the
> trick is how to take it quickly.  The main obstacle to making this
> faster is that the deadlock detector needs to be able to obtain enough
> information to break cycles, which means we've got to record in shared
> memory not only the locks that are granted but who has them.

I'm not very familiar with PostgreSQL code but if we're
brainstorming... if you're only trying to protect against a small
number of expensive operations (like DROP, etc.) that don't really
happen often, wouldn't an atomic reference counter be good enough for
the purpose (e.g. the expensive operations would spin-wait until the
counter is 0)?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-11-25 Thread Ivan Voras
On 26 November 2010 03:00, Greg Smith  wrote:

> Two suggestions to improve your results here:
>
> 1) Don't set shared_buffers to 10GB.  There are some known issues with large
> settings for that which may or may not be impacting your results.  Try 4GB
> instead, just to make sure you're not even on the edge of that area.
>
> 2) pgbench itself is known to become a bottleneck when running with lots of
> clients.  You should be using the "-j" option to spawn multiple workers,
> probably 12 of them (one per core), to make some of this go away.  On the
> system I saw the most improvement here, I got a 15-25% gain having more
> workers at the higher client counts.

> It will be interesting to see if that's different after the changes
> suggested above.

Too late, can't test on the hardware anymore. I did use -j on pgbench,
but after 2 threads there were not significant improvements - the two
threads did not saturate two CPU cores.

However, I did run a similar select-only test on tmpfs on different
hardware with much less memory (4 GB total), with shared_buffers
somewhere around 2 GB, with the same performance curve:

http://ivoras.sharanet.org/blog/tree/2010-07-21.postgresql-on-tmpfs.html

so I doubt the curve would change by reducing shared_buffers below
what I used in the original post.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-11-25 Thread Ivan Voras

On 11/22/10 18:47, Kevin Grittner wrote:

Ivan Voras  wrote:


It looks like a hack


Not to everyone.  In the referenced section, Hellerstein,
Stonebraker and Hamilton say:

"any good multi-user system has an admission control policy"

In the case of PostgreSQL I understand the counter-argument,
although I'm inclined to think that it's prudent for a product to
limit resource usage to a level at which it can still function well,
even if there's an external solution which can also work, should
people use it correctly.  It seems likely that a mature admission
control policy could do a better job of managing some resources than
an external product could.


I didn't think it would be that useful but yesterday I did some 
(unrelated) testing with MySQL and it looks like its configuration 
parameter "thread_concurrency" does something to that effect.


Initially I thought it is equivalent to PostgreSQL's max_connections but 
no, connections can grow (MySQL spawns a thread per connection by 
default) but the actual concurrency is limited in some way by this 
parameter.


The comment for the parameter says "# Try number of CPU's*2 for 
thread_concurrency" but obviously it would depend a lot on the 
real-world load.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-11-23 Thread Ivan Voras
On 24 November 2010 01:11, Craig Ringer  wrote:
> On 11/22/2010 11:38 PM, Ivan Voras wrote:

>> It looks like a hack (and one which is already implemented by connection
>> pool software); the underlying problem should be addressed.
>
> My (poor) understanding is that addressing the underlying problem would
> require a massive restructure of postgresql to separate "connection and
> session state" from "executor and backend". Idle connections wouldn't
> require a backend to sit around unused but participating in all-backends
> synchronization and signalling. Active connections over a configured maximum
> concurrency limit would queue for access to a backend rather than fighting
> it out for resources at the OS level.
>
> The trouble is that this would be an *enormous* rewrite of the codebase, and
> would still only solve part of the problem. See the prior discussion on
> in-server connection pooling and admission control.

I'm (also) not a PostgreSQL developer so I'm hoping that someone who
is will join the thread, but speaking generally, there is no reason
why this couldn't be a simpler problem which just requires
finer-grained locking or smarter semaphore usage.

I'm not talking about forcing performance out of situation where there
are no more CPU cycles to take, but about degrading gracefully in
those circumstances and not taking a 80%+ drop because of spinning
around in semaphore syscalls.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-11-22 Thread Ivan Voras

On 11/22/10 16:26, Kevin Grittner wrote:

Ivan Voras  wrote:

On 11/22/10 02:47, Kevin Grittner wrote:

Ivan Voras  wrote:


After 16 clients (which is still good since there are only 12
"real" cores in the system), the performance drops sharply


Yet another data point to confirm the importance of connection
pooling.  :-)


I agree, connection pooling will get rid of the symptom. But not
the underlying problem. I'm not saying that having 1000s of
connections to the database is a particularly good design, only
that there shouldn't be a sharp decline in performance when it
does happen. Ideally, the performance should remain the same as it
was at its peek.


Well, I suggested that we add an admission control[1] mechanism,


It looks like a hack (and one which is already implemented by connection 
pool software); the underlying problem should be addressed.


But on the other hand if it's affecting so many people, maybe a warning 
comment in postgresql.conf around max_connections would be helpful.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-11-21 Thread Ivan Voras

On 11/22/10 02:47, Kevin Grittner wrote:

Ivan Voras  wrote:


After 16 clients (which is still good since there are only 12
"real" cores in the system), the performance drops sharply


Yet another data point to confirm the importance of connection
pooling.  :-)


I agree, connection pooling will get rid of the symptom. But not the 
underlying problem. I'm not saying that having 1000s of connections to 
the database is a particularly good design, only that there shouldn't be 
a sharp decline in performance when it does happen. Ideally, the 
performance should remain the same as it was at its peek.


I've been monitoring the server some more and it looks like there are 
periods where almost all servers are in the semwait state followed by 
periods of intensive work - approximately similar to the "thundering 
herd" problem, or maybe to what Josh Berkus has posted a few days ago.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance under contention

2010-11-21 Thread Ivan Voras
This is not a request for help but a report, in case it helps developers 
or someone in the future. The setup is:


AMD64 machine, 24 GB RAM, 2x6-core Xeon CPU + HTT (24 logical CPUs)
FreeBSD 8.1-stable, AMD64
PostgreSQL 9.0.1, 10 GB shared buffers, using pgbench with a scale 
factor of 500 (7.5 GB database)


with pgbench -S (SELECT-queries only) the performance curve is:

-c# result
4   33549
8   64864
12  79491
16  79887
20  66957
24  52576
28  50406
32  49491
40  45535
50  39499
75  29415

After 16 clients (which is still good since there are only 12 "real" 
cores in the system), the performance drops sharply, and looking at the 
processes' state, most of them seem to eat away system call (i.e. 
executing in the kernel) in states "semwait" and "sbwait", i.e. 
semaphore wait and socket buffer wait, for example:


 3047 pgsql   1  600 10533M   283M sbwait 12   0:01  6.79% postgres
 3055 pgsql   1  640 10533M   279M sbwait 15   0:01  6.79% postgres
 3033 pgsql   1  640 10533M   279M semwai  6   0:01  6.69% postgres
 3038 pgsql   1  640 10533M   283M CPU5   13   0:01  6.69% postgres
 3037 pgsql   1  620 10533M   279M sbwait 23   0:01  6.69% postgres
 3048 pgsql   1  650 10533M   280M semwai  4   0:01  6.69% postgres
 3056 pgsql   1  650 10533M   277M semwai  1   0:01  6.69% postgres
 3002 pgsql   1  620 10533M   284M CPU19   0   0:01  6.59% postgres
 3042 pgsql   1  630 10533M   279M semwai 21   0:01  6.59% postgres
 3029 pgsql   1  630 10533M   277M semwai 23   0:01  6.59% postgres
 3046 pgsql   1  630 10533M   278M RUN 5   0:01  6.59% postgres
 3036 pgsql   1  630 10533M   278M CPU1   12   0:01  6.59% postgres
 3051 pgsql   1  630 10533M   277M semwai  1   0:01  6.59% postgres
 3030 pgsql   1  630 10533M   281M semwai  1   0:01  6.49% postgres
 3050 pgsql   1  600 10533M   276M semwai  1   0:01  6.49% postgres

The "sbwait" part is from FreeBSD - IPC sockets, but so much blocking on 
semwait indicates large contention in PostgreSQL.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: How to achieve sustained disk performance of 1.25 GB write for 5 mins

2010-11-17 Thread Ivan Voras

On 11/17/10 22:11, Eric Comeau wrote:


*) what kind of data do you expect to be writing out at this speed?


Large Video files ... our s/w is used to displace FTP.


*) how many transactions per second will you expect to have?


Ideally 1 large file, but it may have to be multiple. We find that if we
send multiple files it just causes the disk to thrash more so we get
better throughput by sending one large file.

>

*) what is the architecture of the client? how many connections will
be open to postgres writing?


Our s/w can do multiple streams, but I believe we get better performance
with 1 stream handling one large file, you could have 4 streams with 4
files in flight, but the disk thrashes more... postgres is not be
writing the file data, our agent reports back to postgres stats on the
transfer rate being achieved ... postgres transactions is not the issue.
The client and server are written in C and use UDP (with our own error
correction) to achieve high network throughput as opposed to TCP.


I hope you know what you are doing, there is a large list of tricks used 
by modern high performance FTP and web servers to get maximum 
performance from hardware and the operating system while minimizing CPU 
usage - and most of them don't work with UDP.


Before you test with real hardware, try simply sending dummy data or 
/dev/null data (i.e. not from disks, not from file systems) and see how 
it goes.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Anyone seen this kind of lock pileup?

2010-11-17 Thread Ivan Voras

On 11/17/10 18:37, Josh Berkus wrote:

All,

Having an interesting issue on one 8.4 database. Due to poor application
design, the application is requesting 8-15 exclusive (update) locks on
the same row on parallel connections pretty much simultaneously (i.e. <
50ms apart).

What's odd about this is that the resulting "lock pileup" takes a
mysterious 2-3.5 seconds to clear, despite the fact that none of the
connections are *doing* anything during that time, nor are there
deadlock errors. In theory at least, the locks should clear out in
reverse order in less than a second; none of the individual statements
takes more than 10ms to execute.


Just a random guess: a timeout-supported livelock? (of course if there 
is any timeout-and-retry protocol going on and the timeout intervals are 
non-randomized).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: How to achieve sustained disk performance of 1.25 GB write for 5 mins

2010-11-17 Thread Ivan Voras

On 11/17/10 15:26, Eric Comeau wrote:

This is not directly a PostgreSQL performance question but I'm hoping
some of the chaps that build high IO PostgreSQL servers on here can help.

We build file transfer acceleration s/w (and use PostgreSQL as our
database) but we need to build a test server that can handle a sustained
write throughput of 1,25 GB for 5 mins.


Just to clarify: you need 1.25 GB/s write throughput?

For one thing, you need not only fast storage but also a fast CPU and 
file system. If you are going to stream this data directly over the 
network in a single FTP-like session, you need fast single-core 
performance (so buy the fastest low-core-count CPU possible) and a file 
system which doesn't interfere much with raw data streaming. If you're 
using Linux I'd guess either something very simple like ext2 or complex 
but designed for the task like XFS might be best. On FreeBSD, ZFS is 
great for streaming but you'll spend a lot of time tuning it :)


From the hardware POW, since you don't really need high IOPS rates, you 
can go much cheaper with a large number of cheap desktop drives than 
with SSD-s, if you can build something like this: 
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/


You don't need the storage space here, but you *do* need many drives to 
achieve speed in RAID (remember to overdesign and assume 50 MB/s per drive).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle

2010-10-27 Thread Ivan Voras
On 10/26/10 17:41, Merlin Moncure wrote:
> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci  
> wrote:
>>> temp  tables are not wal logged or
>>> synced.  Periodically they can be flushed  to a permanent table.
>>
>>
>> What do you mean with "Periodically they can be flushed  to
>> a permanent table"? Just doing
>>
>> insert into tabb select * from temptable
>>
> 
> yup, that's exactly what I mean -- this will give you more uniform

In effect, when so much data is in temporary storage, a better option
would be to simply configure "synchronous_commit = off" (better in the
sense that the application would not need to be changed). The effects
are almost the same - in both cases transactions might be lost but the
database will survive.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPUs for new databases

2010-10-26 Thread Ivan Voras

On 10/27/10 01:45, James Cloos wrote:

"JB" == Josh Berkus  writes:


JB>  In a general workload, fewer faster cores are better.  We do not scale
JB>  perfectly across cores.  The only case where that's not true is
JB>  maintaining lots of idle connections, and that's really better dealt
JB>  with in software.

I've found that ram speed is the most limiting factor I've run into for
those cases where the db fits in RAM.  The less efficient lookups run
just as fast when the CPU is in powersving mode as in performance, which
implies that the cores are mostly waiting on RAM (cache or main).

I suspect cache size and ram speed will be the most important factors
until the point where disk i/o speed and capacity take over.


FWIW, yes - once the IO is fast enough or not necessary (e.g. the 
read-mostly database fits in RAM), RAM bandwidth *is* the next 
bottleneck and it really, really can be observed in actual loads. Buying 
a QPI-based CPU instead of the cheaper DMI-based ones (if talking about 
Intel chips), and faster memory modules (DDR3-1333+) really makes a 
difference in this case.


(QPI and DMI are basically the evolution the front side bus; AMD had HT 
- HyperTransport for years now. Wikipedia of course has more information 
for the interested.)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Ivan Voras

On 10/14/10 21:43, Tony Capobianco wrote:



We have 4 quad-core processors and 32GB of RAM.  The below query uses
the members_sorted_idx_001 index in oracle, but in postgres, the
optimizer chooses a sequential scan.

explain analyze create table tmp_srcmem_emws1
as
select emailaddress, websiteid
   from members
  where emailok = 1
and emailbounced = 0;
   QUERY
PLAN
--
  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
(actual time=0.052..685834.785 rows=236660930 loops=1)
Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
  Total runtime: 850306.220 ms
(3 rows)



Indexes:
 "email_website_unq" UNIQUE, btree (emailaddress, websiteid),
tablespace "members_idx"
 "member_addeddateid_idx" btree (addeddate_id), tablespace
"members_idx"
 "member_changedateid_idx" btree (changedate_id), tablespace
"members_idx"
 "members_fdate_idx" btree (to_char_year_month(addeddate)),
tablespace "esave_idx"
 "members_memberid_idx" btree (memberid), tablespace "members_idx"
 "members_mid_emailok_idx" btree (memberid, emailaddress, zipcode,
firstname, emailok), tablespace "members_idx"
 "members_sorted_idx_001" btree (websiteid, emailok, emailbounced,
addeddate, memberid, zipcode, statecode, emailaddress), tablespace
"members_idx"
 "members_src_idx" btree (websiteid, emailbounced, sourceid),
tablespace "members_idx"
 "members_wid_idx" btree (websiteid), tablespace "members_idx"


PostgreSQL doesn't fetch data directly from indexes, so there is no way 
for it to reasonably use an index declared like:


"members_sorted_idx_001" btree (websiteid, emailok, emailbounced, 
addeddate, memberid, zipcode, statecode, emailaddress)


You need a direct index on the fields you are using in your query, i.e. 
an index on (emailok, emailbounced).


OTOH, those columns look boolean-like. It depends on what your data set 
is, but if the majority of records contain (emailok=1 and 
emailbounced=0) an index may not help you much.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-07 Thread Ivan Voras

On 10/07/10 02:39, Robert Haas wrote:

On Wed, Oct 6, 2010 at 6:31 PM, Ivan Voras  wrote:

On 10/04/10 20:49, Josh Berkus wrote:


The other major bottleneck they ran into was a kernel one: reading from
the heap file requires a couple lseek operations, and Linux acquires a
mutex on the inode to do that. The proper place to fix this is
certainly in the kernel but it may be possible to work around in
Postgres.


Or we could complain to Kernel.org.  They've been fairly responsive in
the past.  Too bad this didn't get posted earlier; I just got back from
LinuxCon.

So you know someone who can speak technically to this issue? I can put
them in touch with the Linux geeks in charge of that part of the kernel
code.


Hmmm... lseek? As in "lseek() then read() or write()" idiom? It AFAIK
cannot be fixed since you're modifying the global "strean position"
variable and something has got to lock that.


Well, there are lock free algorithms using CAS, no?


Nothing is really "lock free" - in this case the algorithms simply push 
the locking down to atomic operations on the CPU (and the memory bus). 
Semantically, *something* has to lock the memory region for however 
brief period of time and then propagate that update to other CPUs' 
caches (i.e. invalidate them).



OTOH, pread() / pwrite() don't have to do that.


Hey, I didn't know about those.  That sounds like it might be worth
investigating, though I confess I lack a 48-core machine on which to
measure the alleged benefit.


As Jon said, it will in any case reduce the number of these syscalls by 
half, and they can be wrapped by a C macro for the platforms which don't 
implement them.


http://man.freebsd.org/pread

(and just in case it's needed: pread() is a special case of preadv()).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-07 Thread Ivan Voras
On 7 October 2010 03:25, Tom Lane  wrote:
> Ivan Voras  writes:
>> On 10/04/10 20:49, Josh Berkus wrote:
>>>> The other major bottleneck they ran into was a kernel one: reading from
>>>> the heap file requires a couple lseek operations, and Linux acquires a
>>>> mutex on the inode to do that.
>
>> Hmmm... lseek? As in "lseek() then read() or write()" idiom? It AFAIK
>> cannot be fixed since you're modifying the global "strean position"
>> variable and something has got to lock that.
>
> Um, there is no "global stream position" associated with an inode.
> A file position is associated with an open-file descriptor.

You're right of course, I was pattern matching late last night on the
"lseek()" and "locking problems" keywords and ignored "inode".

> If Josh quoted the problem correctly, the issue is that the kernel is
> locking a file's inode (which may be referenced by quite a lot of file
> descriptors) in order to change the state of one file descriptor.
> It sure sounds like a possible source of contention to me.

Though it does depend on the details of how pg uses it. Forked
processes share their parents' file descriptors.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-06 Thread Ivan Voras
On 10/04/10 20:49, Josh Berkus wrote:

>> The other major bottleneck they ran into was a kernel one: reading from
>> the heap file requires a couple lseek operations, and Linux acquires a
>> mutex on the inode to do that. The proper place to fix this is
>> certainly in the kernel but it may be possible to work around in
>> Postgres.
> 
> Or we could complain to Kernel.org.  They've been fairly responsive in
> the past.  Too bad this didn't get posted earlier; I just got back from
> LinuxCon.
> 
> So you know someone who can speak technically to this issue? I can put
> them in touch with the Linux geeks in charge of that part of the kernel
> code.

Hmmm... lseek? As in "lseek() then read() or write()" idiom? It AFAIK
cannot be fixed since you're modifying the global "strean position"
variable and something has got to lock that.

OTOH, pread() / pwrite() don't have to do that.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Big field, limiting and ordering

2010-07-19 Thread Ivan Voras
Hello,

I don't think this is generally solvable but maybe it is so here goes.
The original situation was this:

SELECT something, big_field, complex_function(big_field), rank FROM t1
UNION ALL SELECT something, big_field, complex_function(big_field), rank
from t2 ORDER BY rank LIMIT small_number;

This query first fetches all big_field datums and does all
complex_function() calculations on them, then orders then by rank, even
though I actually need only small_number of records. There are two
problems here: first, selecting for all big_field values requires a lot
of memory, which is unacceptable, and then, running complex_function()
on all of them takes too long.

I did get rid of unnecessary complex_function() calculations by nesting
queries like:

SELECT something, big_field, complex_function(big_field), rank FROM
(SELECT original_query_without_complex_function_but_with_big_field ORDER
BY rank LIMIT small_number);

but this still leaves gathering all the big_field datum from the
original query. I cannot pull big_field out from this subquery because
it comes from UNION of tables.

Any suggestions?

(I cannot limit big_field with substring() to reduce memory usage
because it messes up complex_function()).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
On 14 July 2010 17:16, Kevin Grittner  wrote:
> Ivan Voras  wrote:
>
>> which didn't help.
>
> Didn't help what?  You're processing each row in 22.8 microseconds.
> What kind of performance were you expecting?

Well, I guess you're right. What I was expecting is a large bump in
speed going from LIKE to tsearch2 but then there's also record
processing outside the index itself, which is probably where the time
goes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
On 07/14/10 16:03, Kevin Grittner wrote:
> Ivan Voras <  ivo...@freebsd.org  > wrote:
>> On 07/14/10 15:49, Stephen Frost wrote:
>  
>>> Regarding the statistics, it's entirely possible that the index
>>> is *not* the fastest way to pull this data (it's nearly 10% of
>>> the table..)
>>
>> I think that what I'm asking here is: is it reasonable for
>> tsearch2 to extract 8,500 rows from an index of 90,000 rows in 118
>> ms, given that the approximately same task can be done with an
>> unindexed "LIKE" operator in nearly the same time?
>  
> The answer is "yes."  When it's 10% of the table, a sequential scan
> can be more efficient than an index, as Stephen indicated.

Ok, to verify this I've tried increasing statistics on the field and
running vacumm analyze full, which didn't help. Next, I've tried setting
enable_indexscan to off, which also didn't do it:

cms=> set enable_indexscan=off;
SET
cms=> explain analyze select id,title from forum where _fts_ @@
'fer'::tsquery order by id limit 10;
 QUERY PLAN

-
 Limit  (cost=363.18..363.20 rows=10 width=35) (actual
time=192.243..192.406 rows=10 loops=1)
   ->  Sort  (cost=363.18..363.40 rows=91 width=35) (actual
time=192.229..192.283 rows=10 loops=1)
 Sort Key: id
 Sort Method:  top-N heapsort  Memory: 25kB
 ->  Bitmap Heap Scan on forum  (cost=29.21..361.21 rows=91
width=35) (actual time=12.071..136.130 rows=8449 loops=1)
   Recheck Cond: (_fts_ @@ '''fer'''::tsquery)
   ->  Bitmap Index Scan on forum_fts  (cost=0.00..29.19
rows=91 width=0) (actual time=11.169..11.169 rows=8449 loops=1)
 Index Cond: (_fts_ @@ '''fer'''::tsquery)
 Total runtime: 192.686 ms
(9 rows)

Any ideas on how to verify this?




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
On 07/14/10 15:49, Stephen Frost wrote:
> * Ivan Voras (ivo...@freebsd.org) wrote:
>>  Total runtime: 0.507 ms
> [...]
>>  Total runtime: 118.689 ms
>>
>> See in the first query where I have a simple LIMIT, it fetches random 10
>> rows quickly, but in the second one, as soon as I give it to execute and
>> calculate the entire result set before I limit it, the performance is
>> horrible.
> 
> What you've shown is that it takes 0.5ms for 10 rows, and 118ms for 8500
> rows.

Yes, but...

>  Now, maybe I've missed something, but that's 0.05ms per row for
> the first query and 0.01ms per row for the second, and you've added a
> sort into the mix.  The performance of going through the data actually
> improves on a per-record basis, since you're doing more in bulk.
> 
> Since you're ordering by 'id', PG has to look at every row returned by
> the index scan.  That's not free.

This part of the processing is going on on the backend, and the backend
needs to sort through 8500 integers. I don't think the sort is
significant here.

> Regarding the statistics, it's entirely possible that the index is *not*
> the fastest way to pull this data (it's nearly 10% of the table..), if
> the stats were better it might use a seq scan instead, not sure how bad
> the cost of the filter itself would be.

I think that what I'm asking here is: is it reasonable for tsearch2 to
extract 8,500 rows from an index of 90,000 rows in 118 ms, given that
the approximately same task can be done with an unindexed "LIKE"
operator in nearly the same time?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
On 07/14/10 15:25, Oleg Bartunov wrote:
> On Wed, 14 Jul 2010, Ivan Voras wrote:
> 
>>> Returning 8449 rows could be quite long.
>>
>> You are right, I didn't test this. Issuing a query which returns a
>> smaller result set is much faster.
>>
>> But, offtopic, why would returning 8500 records, each around 100 bytes
>> long so around 8.5 MB, over local unix sockets, be so slow? The machine
>> in question has a sustained memory bendwidth of nearly 10 GB/s. Does
>> PostgreSQL spend much time marshalling the data through the socket
>> stream?
> 
> It's disk access time.
> in the very bad case it could take  ~5 ms (for fast drive) to get one just
> one row.

No, it's not that. The table fits in RAM, I've verified there is no disk
IO involved. Something else is wrong:

cms=> explain analyze select id,title from forum where _fts_ @@
'fer'::tsquery limit 10;
 QUERY PLAN


 Limit  (cost=0.00..43.31 rows=10 width=35) (actual time=0.194..0.373
rows=10 loops=1)
   ->  Index Scan using forum_fts on forum  (cost=0.00..394.10 rows=91
width=35) (actual time=0.182..0.256 rows=10 loops=1)
 Index Cond: (_fts_ @@ '''fer'''::tsquery)
 Total runtime: 0.507 ms
(4 rows)

cms=> explain analyze select id,title from forum where _fts_ @@
'fer'::tsquery order by id limit 10;
QUERY PLAN

---
 Limit  (cost=363.18..363.20 rows=10 width=35) (actual
time=118.358..118.516 rows=10 loops=1)
   ->  Sort  (cost=363.18..363.40 rows=91 width=35) (actual
time=118.344..118.396 rows=10 loops=1)
 Sort Key: id
 Sort Method:  top-N heapsort  Memory: 25kB
 ->  Bitmap Heap Scan on forum  (cost=29.21..361.21 rows=91
width=35) (actual time=3.066..64.091 rows=8449 loops=1)
   Recheck Cond: (_fts_ @@ '''fer'''::tsquery)
   ->  Bitmap Index Scan on forum_fts  (cost=0.00..29.19
rows=91 width=0) (actual time=2.106..2.106 rows=8449 loops=1)
 Index Cond: (_fts_ @@ '''fer'''::tsquery)
 Total runtime: 118.689 ms
(9 rows)

See in the first query where I have a simple LIMIT, it fetches random 10
rows quickly, but in the second one, as soon as I give it to execute and
calculate the entire result set before I limit it, the performance is
horrible.




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
On 07/14/10 14:31, Oleg Bartunov wrote:
> Something is not good with statistics,  91 est. vs 8449 actually returned.

I don't think the statistics difference is significant - it's actually
using the index so it's ok. And I've run vacuum analyze just before
starting the query.

> Returning 8449 rows could be quite long.

You are right, I didn't test this. Issuing a query which returns a
smaller result set is much faster.

But, offtopic, why would returning 8500 records, each around 100 bytes
long so around 8.5 MB, over local unix sockets, be so slow? The machine
in question has a sustained memory bendwidth of nearly 10 GB/s. Does
PostgreSQL spend much time marshalling the data through the socket stream?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Understanding tsearch2 performance

2010-07-14 Thread Ivan Voras
Here's a query and its EXPLAIN ANALYZE output:

cms=> select count(*) from forum;
 count
---
 90675
(1 row)

cms=> explain analyze select id,title from forum where _fts_ @@
'fer'::tsquery;
  QUERY PLAN

---
 Bitmap Heap Scan on forum  (cost=29.21..361.21 rows=91 width=35)
(actual time=2.946..63.646 rows=8449 loops=1)
   Recheck Cond: (_fts_ @@ '''fer'''::tsquery)
   ->  Bitmap Index Scan on forum_fts  (cost=0.00..29.19 rows=91
width=0) (actual time=2.119..2.119 rows=8449 loops=1)
 Index Cond: (_fts_ @@ '''fer'''::tsquery)
 Total runtime: 113.641 ms
(5 rows)

The problem is - tsearch2 seems too slow. I have nothing to compare it
to but 113 ms for searching through this small table of 90,000 records
seems too slow. The forum_fts index is of GIN type and the table
certainly fits into RAM.

When I issue a dumb query without an index, I get a comparable order of
magnitude performance:

cms=> explain analyze select id,title from forum where content ilike
'%fer%';
 QUERY PLAN


 Seq Scan on forum  (cost=0.00..7307.44 rows=3395 width=35) (actual
time=0.030..798.375 rows=10896 loops=1)
   Filter: (content ~~* '%fer%'::text)
 Total runtime: 864.384 ms
(3 rows)

cms=> explain analyze select id,title from forum where content like '%fer%';
QUERY PLAN

---
 Seq Scan on forum  (cost=0.00..7307.44 rows=3395 width=35) (actual
time=0.024..146.959 rows=7596 loops=1)
   Filter: (content ~~ '%fer%'::text)
 Total runtime: 191.732 ms
(3 rows)

Some peculiarities of the setup which might or might not influence this
performance:

1) I'm using ICU-patched postgresql because I cannot use my UTF-8 locale
otherwise - this is why the difference between the dumb queries is large
(but I don't see how this can influence tsearch2 since it pre-builds the
tsvector data with lowercase lexemes)

2) My tsearch2 lexer is somewhat slow (but I don't see how it can
influence these read-only queries on a pre-built, lexed and indexed data)

Any ideas?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] raid10 write performance

2010-06-23 Thread Ivan Voras
On 06/23/10 14:00, Florian Weimer wrote:
> * Ivan Voras:
> 
>> On the other hand, RAID10 is simple enough that soft-RAID
>> implementations should be more than adequate - any ideas why a dedicated
>> card has it "slow"?
> 
> Barrier support on RAID10 seems to require some smallish amount of
> non-volatile storage which supports a high number of write operations
> per second, so a software-only solution might not be available.

If I understand you correctly, this can be said in general for all
spinning-disk usage and is not specific to RAID10. (And in the case of
high, constant TPS, no amount of NVRAM will help you).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] raid10 write performance

2010-06-23 Thread Ivan Voras
On 06/22/10 16:40, Greg Smith wrote:
> Grzegorz Jaśkiewicz wrote:
>> raid: serveRAID M5014 SAS/SATA controller
>>   
> 
> Do the "performant servers" have a different RAID card?  This one has
> terrible performance, and could alone be the source of your issue.  The
> ServeRAID cards are slow in general, and certainly slow running RAID10.

What are some good RAID10 cards nowadays?

On the other hand, RAID10 is simple enough that soft-RAID
implementations should be more than adequate - any ideas why a dedicated
card has it "slow"?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8K recordsize bad on ZFS?

2010-05-10 Thread Ivan Voras

On 05/10/10 20:39, Josh Berkus wrote:

On 5/9/10 1:45 AM, Dimitri wrote:

Josh,

it'll be great if you explain how did you change the records size to
128K? - as this size is assigned on the file creation and cannot be
changed later - I suppose that you made a backup of your data and then
process a full restore.. is it so?


You can change the recordsize of the zpool dynamically, then simply copy
the data directory (with PostgreSQL shut down) to a new directory on
that zpool.  This assumes that you have enough space on the zpool, of
course.


Other things could have influenced your result - 260 MB/s vs 300 MB/s is 
close enough to be influenced by data position on (some of) the drives. 
(I'm not saying anything about the original question.)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Ivan Voras

Corin wrote:

Hi all,

I'm running quite a large social community website (250k users, 16gb 
database). We are currently preparing a complete relaunch and thinking 
about switching from mysql 5.1.37 innodb to postgresql 8.4.2. The 


"relaunch" looks like you are nearing the end (the "launch") of the 
project - if so, you should know that switching databases near the 
project deadline is almost always a suicidal act. Even if the big 
differences are easily fixable, the small differences will kill you.


database server is a dual dualcore operton 2216 with 12gb ram running on 
debian amd64.


For a first impression I ran a simple query on our users table (snapshot 
with only ~ 45.000 records). The table has an index on birthday_age 
[integer]. The test executes 10 times the same query and simply discards 
the results. I ran the tests using a php and a ruby script, the results 
are almost the same.


Your table will probably fit in RAM but the whole database obviously 
won't. Not that it matters here.


Did you configure anything at all in postgresql.conf? The defaults 
assume a very small database.


Unluckily mysql seems to be around 3x as fast as postgresql for this 
simple query. There's no swapping, disc reading involved...everything is 
in ram.


It depends...


15.115976333618


So this is 15 ms?


postgresql
{"QUERY PLAN"=>"Limit (cost=125.97..899.11 rows=1000 width=448) (actual 
time=0.927..4.990 rows=1000 loops=1)"}
{"QUERY PLAN"=>" -> Bitmap Heap Scan on users (cost=125.97..3118.00 
rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)"}
{"QUERY PLAN"=>" Recheck Cond: (((birthday_age >= 12) AND (birthday_age 
<= 13)) OR ((birthday_age >= 20) AND (birthday_age <= 22)))"}
{"QUERY PLAN"=>" -> BitmapOr (cost=125.97..125.97 rows=3952 width=0) 
(actual time=0.634..0.634 rows=0 loops=1)"}
{"QUERY PLAN"=>" -> Bitmap Index Scan on birthday_age (cost=0.00..41.67 
rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)"}
{"QUERY PLAN"=>" Index Cond: ((birthday_age >= 12) AND (birthday_age <= 
13))"}
{"QUERY PLAN"=>" -> Bitmap Index Scan on birthday_age (cost=0.00..82.37 
rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)"}
{"QUERY PLAN"=>" Index Cond: ((birthday_age >= 20) AND (birthday_age <= 
22))"}

{"QUERY PLAN"=>"Total runtime: 5.847 ms"}
44.173002243042


I also wonder why the reported runtime of 5.847 ms is so much different 
to the runtime reported of my scripts (both php and ruby are almost the 


It looks like you are spending ~~38 ms in delivering the data to your 
application. Whatever you are using, stop using it :)


same). What's the best tool to time queries in postgresql? Can this be 
done from pgadmin?


The only rational way is to measure at the database itself and not 
include other factors like the network, scripting language libraries, 
etc. To do this, login at your db server with a shell and use psql. 
Start it as "psql databasename username" and issue a statement like 
"EXPLAIN ANALYZE SELECT ...your_query...". Unless magic happens, this 
will open a local unix socket connection to the database for the query, 
which has the least overhead.


You can of course also do this for MySQL though I don't know if it has 
an equivalent of "EXPLAIN ANALYZE".


But even after you have found where the problem is, and even if you see 
that Pg is faster than MySQL, you will still need realistic loads to 
test the real-life performance difference.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Ivan Voras
On 4 February 2010 10:02, Amitabh Kant  wrote:
> On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras  wrote:
>>
>> On 02/03/10 16:10, Amitabh Kant wrote:
>>>
>>> Hello
>>>
>>> I have a server dedicated for Postgres with the following specs:
>>>
>>> RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
>>> 2.33GHz
>>> OS: FreeBSD 8.0
>>
>> If you really do have "heavy read and write" load on the server, nothing
>> will save you from the bottleneck of having only 4 drives in the system (or
>> more accurately: adding more memory will help reads but nothing helps writes
>> except more drivers or faster (SSD) drives). If you can, add another 2
>> drives in RAID 1 and move+symlink the pg_xlog directory to the new array.
>
> Can't do anything about this server now, but would surely keep in mind
> before upgrading other servers. Would you recommend the same speed
> drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
> or even SATA II)?

Again, it depends on your load. It would probably be best if they are
approximately the same speed; the location of pg_xlog will dictate
your write (UPDATE / INSERT / CREATE) speed.

Writes to your database go like this: the data is first written to the
WAL (this is the pg_xlog directory - the transaction log), then it is
read and written to the "main" database. If the main database is very
busy reading, transfers from WAL to the database will be slower.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-03 Thread Ivan Voras

On 02/03/10 16:10, Amitabh Kant wrote:

Hello

I have a server dedicated for Postgres with the following specs:

RAM 16GB, 146GB SAS (15K) x 4 -  RAID 10 with BBU, Dual Xeon  E5345  @
2.33GHz
OS: FreeBSD 8.0


If you really do have "heavy read and write" load on the server, nothing 
will save you from the bottleneck of having only 4 drives in the system 
(or more accurately: adding more memory will help reads but nothing 
helps writes except more drivers or faster (SSD) drives). If you can, 
add another 2 drives in RAID 1 and move+symlink the pg_xlog directory to 
the new array.



maintenance_work_mem = 960MB # pg_generate_conf wizard 2010-02-03
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2010-02-03
effective_cache_size = 11GB # pg_generate_conf wizard 2010-02-03
work_mem = 160MB # pg_generate_conf wizard 2010-02-03
wal_buffers = 8MB # pg_generate_conf wizard 2010-02-03
checkpoint_segments = 16 # pg_generate_conf wizard 2010-02-03
shared_buffers = 3840MB # pg_generate_conf wizard 2010-02-03
max_connections = 100 # pg_generate_conf wizard 2010-02-03



I would appreciate if somebody could point out the sysctl/loader.conf
settings that I need to have in FreeBSD.


Firstly, you need to run a 64-bit version ("amd64") of FreeBSD.

In /boot/loader.conf you will probably need to increase the number of 
sysv ipc semaphores:


kern.ipc.semmni=512
kern.ipc.semmns=1024

This depends mostly on the number of connections allowed to the server. 
The example values I gave above are more than enough but since this is a 
boot-only tunable it is expensive to modify later.


In /etc/sysctl.conf you will need to increase the shared memory sizes, 
e.g. for a 3900 MB shared_buffer:


kern.ipc.shmmax=4089446400
This is the maximum shared memory segment size, in bytes.

kern.ipc.shmall=105
This is the maximum amount of memory allowed to be used as sysv shared 
memory, in 4 kB pages.


If the database contains many objects (tables, indexes, etc.) you may 
need to increase the maximum number of open files and the amount of 
memory for the directory list cache:


kern.maxfiles=16384
vfs.ufs.dirhash_maxmem=4194304

If you estimate you will have large sequential reads on the database, 
you should increase read-ahead count:


vfs.read_max=32

Be sure that soft-updates is enabled on the file system you are using 
for data. Ignore all Linux-centric discussions about problems with 
journaling and write barriers :)


All settings in /etc/sysctl.conf can be changed at runtime (individually 
or by invoking "/etc/rc.d/sysctl restart"), settings in loader.conf are 
boot-time only.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-28 Thread Ivan Voras
James Mansion wrote:
> Ivan Voras wrote:
>> I wish that, when people got the idea to run a simplistic benchmark
>> like this, they would at least have the common sense to put the
>> database on a RAM drive to avoid problems with different cylinder
>> speeds of rotational media and fragmentation from multiple runs.
> Huh?
>> It's tough to benchmark anything involving rotational drives :)
> But - how the database organises its IO to maximise the available
> bandwidth, limit
> avaiodable seeks, and limit avoidable flushes is absolutely key to
> realistic performance,
> especially on modest everyday hardware. Not everyone has a usage that
> justifies
> 'enterprise' kit - but plenty of people can benefit from something a
> step up from
> SQLite.
> 
> If you just want to benchmark query processor efficiency then that's one
> scenario
> where taking physical IO out of the picture might be justified, but I
> don't see a good
> reason to suggest that it is 'common sense' to do so for all testing,
> and while the
> hardware involved is pretty low end, its still a valid data point.
> .

You are right, of course, for common benchmarking to see what
performance can be expected from some setup in some circumstances, but
not where the intention is to compare products.

You don't have to go the memory drive / SSD route - just make sure the
databases always use the same (small) area of the disk drive.



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Ivan Voras

On 01/27/10 14:28, Thom Brown wrote:

Had a quick look at a benchmark someone put together of MySQL vs
PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk
delete was very slow:
http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html


I wish that, when people got the idea to run a simplistic benchmark like 
this, they would at least have the common sense to put the database on a 
RAM drive to avoid problems with different cylinder speeds of rotational 
media and fragmentation from multiple runs.


Here are some typical results from a desktop SATA drive:

ada0
512 # sectorsize
500107862016# mediasize in bytes (466G)
976773168   # mediasize in sectors
969021  # Cylinders according to firmware.
16  # Heads according to firmware.
63  # Sectors according to firmware.
6QG3Z026# Disk ident.

Seek times:
Full stroke:  250 iter in   5.676993 sec =   22.708 msec
Half stroke:  250 iter in   4.284583 sec =   17.138 msec
Quarter stroke:   500 iter in   6.805539 sec =   13.611 msec
Short forward:400 iter in   2.678447 sec =6.696 msec
Short backward:   400 iter in   2.318637 sec =5.797 msec
Seq outer:   2048 iter in   0.214292 sec =0.105 msec
Seq inner:   2048 iter in   0.203929 sec =0.100 msec
Transfer rates:
outside:   102400 kbytes in   1.229694 sec =83273 kbytes/sec
middle:102400 kbytes in   1.446570 sec =70788 kbytes/sec
inside:102400 kbytes in   2.446670 sec =41853 kbytes/sec

This doesn't explain the 4-orders-of-magnitude difference between MySQL 
and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that 
some other results where the performance is close, might be bogus.


It's tough to benchmark anything involving rotational drives :)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Ivan Voras

On 01/19/10 14:36, fka...@googlemail.com wrote:

Ivan Voras:


[I just skimmed this thread - did you increase the number of WAL logs to
something very large, like 128?]


Yes, I tried even more.

I will be writing data quite constantly in the real scenario
later. So I wonder if increasing WAL logs will have a
positive effect or not: AFAIK when I increase it, the
duration after the max is hit will be longer then (which is
not acceptable in my case).

Could anyone confirm if I got it right?


It seems so - if you are writing constantly then you will probably get 
lower but more long-term-stable performance from a smaller number of WAL 
logs.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Ivan Voras

On 01/19/10 11:16, fka...@googlemail.com wrote:

fka...@googlemail.com:


I'll try to execute these tests on a SSD
and/or Raid system.


FYI:

On a sata raid-0 (mid range hardware) and recent 2x 1.5 TB
disks with a write performance of 100 MB/s (worst, to 200
MB/s max), I get a performance of 18.2 MB/s. Before, with
other disk 43 MB/s (worst to 70 MB/s max) postgres came to
14-16 MB/s.


[I just skimmed this thread - did you increase the number of WAL logs to 
something very large, like 128?]



So, I conclude finally:

(1) Postgresql write throughput (slowly) scales with the
harddisk speed.

(2) The throughput (not counting WAL doubling data) in
postgresql is 20-25% of the disk thoughput.


And this is one of the more often forgot reasons why storing large 
objects in a database rather than in the file systems is a bad idea :)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-15 Thread Ivan Voras

hi,

You wrote a lot of information here so let's confirm in a nutshell what 
you have and what you are looking for:


* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the 
database (UPDATE/INSERT)

* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?

First thing that is noticeable is that you seem to have way too few 
drives in the server - not because of disk space required but because of 
speed. You didn't say what type of drives you have and you didn't say 
what you would consider desirable performance levels, but off hand 
(because of the "10 clients perform constant writes" part) you will 
probably want at least 2x-4x more drives.


> 1) Which RAID level would you recommend

With only 4 drives, RAID 10 is the only thing usable here.

> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.

> 3) If we were to port to a *NIX flavour, which would you recommend? 
(which

> support trouble-free PG builds/makes please!)

Practically any. I'm biased for FreeBSD, a nice and supported version of 
Linux will probably be fine.


> 4) Is this the right PG version for our needs?

If you are starting from scratch on a new server, go for the newest 
version you can get - 8.4.2 in this case.


Most importantly, you didn't say what you would consider desirable 
performance. The hardware and the setup you described will work, but not 
necessarily fast enough.


> . So far, we have never seen a situation where a seq scan has improved
> performance, which I would attribute to the size of the tables

... and to the small number of drives you are using.

> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour 
index-oriented decisions


Have you tried decreasing random_page_cost in postgresql.conf? Or 
setting (as a last resort) enable_seqscan = off?



Carlo Stonebanks wrote:
My client just informed me that new hardware is available for our DB 
server.


. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.

There are less than 10 users, as the server is devoted to the KB production
system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table" of 
500M

rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL 
processes

decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_fa

Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Ivan Voras

Kevin Grittner wrote:

Matthew Wakeling  wrote:
 

This is an FAQ.
 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F


Maybe you could add a short note why an estimation like from the 
pg_class table is usually enough.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Ivan Voras

fka...@googlemail.com wrote:

Hello together,

I need to increase the write performance when inserting
bytea of 8MB. I am using 8.2.4 on windows with libpq.




This takes about 50s, so, 800MB/50s = 16MB/s.

However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?


Do you have any further idea why 16MB/s seems to be the
limit here?


Are you doing it locally or over a network? If you are accessing the 
server over a network then it could be the location of the bottleneck.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-11 Thread Ivan Voras

Ivan Voras wrote:

Yes, but you are issuing 133 write operations per seconds per drive(s) - 
this is nearly the limit of what you can get with 15k RPM drives 
(actually, the limit should be somewhere around 200..250 IOPS but 133 
isn't that far).


I saw in your other post you have fsync turned off so ignore this, it's 
not an IO problem in your case.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-11 Thread Ivan Voras

Bob Dusek wrote:
On Mon, Jan 11, 2010 at 8:50 AM, Scott Marlowe > wrote:


On Mon, Jan 11, 2010 at 6:44 AM, Bob Dusek mailto:redu...@gmail.com>> wrote:
 > Hello,
 >
 > We're running Postgres 8.4.2 on Red Hat 5, on pretty hefty
hardware...
 >
 > 4X E7420 Xeon, Four cores (for a total of 16 cores)
 > 2.13 GHz, 8M Cache, 1066 Mhz FSB
 > 32 Gigs of RAM
 > 15 K RPM drives in striped raid

What method of striped RAID?  RAID-5?  RAID-10?  RAID-4?  RAID-0?


RAID-0


And how many drives?



 > Things run fine, but when we get a lot of concurrent queries
running, we see
 > a pretty good slow down.

Definte "a lot".


We have an application server that is processing requests.  Each request 
consists of a combination of selects, inserts, and deletes.  We actually 
see degredation when we get more than 40 concurrent requests.  The exact 
number of queries executed by each request isn't known.  It varies per 
request.  But, an example request would be about 16 inserts and 113 
selects.  Any given request can't execute more than a single query at a 
time.  


So, you are concurrently trying to achieve more than around 640 writes 
and 4520 reads (worst case figures...). This should be interesting. For 
40 concurrent requests you will probably need at least 4 drives in 
RAID-0 to sustain the write rates (and I'll guess 5-6 to be sure to 
cover read requests also, together with plenty of RAM).



 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  34.290.007.090.030.00   58.58

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00   112.20  0.00 133.40 0.00  1964.80
14.73 0.423.17   0.04   0.48


The iowait seems pretty low, doesn't it?


Yes, but you are issuing 133 write operations per seconds per drive(s) - 
this is nearly the limit of what you can get with 15k RPM drives 
(actually, the limit should be somewhere around 200..250 IOPS but 133 
isn't that far).



top - 10:11:43 up 12 days, 20:48,  6 users,  load average: 10.48, 4.16, 2.83
Tasks: 798 total,   8 running, 790 sleeping,   0 stopped,   0 zombie
Cpu0  : 33.3%us,  7.6%sy,  0.0%ni, 59.1%id,  0.0%wa,  0.0%hi,  0.0%si,  
0.0%st


There is one other possibility - since the CPUs are not very loaded, are 
you sure the client application with which you are testing is fast 
enough to issue enough request to saturate the database?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Ivan Voras
2010/1/7 Lefteris :
> On Thu, Jan 7, 2010 at 3:51 PM, Ivan Voras  wrote:
>> On 7.1.2010 15:23, Lefteris wrote:
>>
>>> I think what you all said was very helpful and clear! The only part
>>> that I still disagree/don't understand is the shared_buffer option:))
>>
>> Did you ever try increasing shared_buffers to what was suggested (around
>> 4 GB) and see what happens (I didn't see it in your posts)?
>
> No I did not to that yet, mainly because I need the admin of the
> machine to change the shmmax of the kernel and also because I have no
> multiple queries running. Does Seq scan uses shared_buffers?

Everything uses shared_buffers, even things that do not benefit from
it. This is because shared_buffers is the part of the general database
IO - it's unavoidable.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Ivan Voras
On 7.1.2010 15:23, Lefteris wrote:

> I think what you all said was very helpful and clear! The only part
> that I still disagree/don't understand is the shared_buffer option:))

Did you ever try increasing shared_buffers to what was suggested (around
4 GB) and see what happens (I didn't see it in your posts)?

Shared_buffers can be thought as the PostgreSQLs internal cache. If the
pages being scanned for a particular query are in the cache, this will
help performance very much on multiple exequtions of the same query.
OTOH, since the file system's cache didn't help you significantly, there
is low possibility shared_buffers will. It is still worth trying.

>From the description of the data ("...from years 1988 to 2009...") it
looks like the query for "between 2000 and 2009" pulls out about half of
the data. If an index could be used instead of seqscan, it could be
perhaps only 50% faster, which is still not very comparable to others.

The table is very wide, which is probably why the tested databases can
deal with it faster than PG. You could try and narrow the table down
(for instance: remove the Div* fields) to make the data more
"relational-like". In real life, speedups in this circumstances would
probably be gained by normalizing the data to make the basic table
smaller and easier to use with indexing.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any have tested ZFS like PostgreSQL installation filesystem?

2009-11-30 Thread Ivan Voras

Ing . Marcos Luís Ortíz Valmaseda wrote:

Ivan Voras escribió:

Ing . Marcos Luís Ortíz Valmaseda wrote:

Regards to all the list.
ZFS, the new filesystem developed by the Solaris Development team and 
ported to FreeBSD too, have many advantages that can do that all 
sysadmins are questioned

about if it is a good filesystem to the PostgreSQL installation.
Any of you haved tested this filesystem like PostgreSQL installation fs?


It will work but as to if it is a good file system for databases, the 
debate still goes on.


Here are some links about ZFS and databases:

http://blogs.sun.com/paulvandenbogaard/entry/postgresql_on_ufs_versus_zfs
http://blogs.sun.com/paulvandenbogaard/entry/running_postgresql_on_zfs_file 


http://blogs.sun.com/realneel/entry/mysql_innodb_zfs_best_practices
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/ 



A separate issue (I think it is not explored enough in the above 
links) is that ZFS writes data in a semi-continuous log, meaning there 
are no in-place modifications of files (every such write is made on a 
different place), which leads to heavy fragmentation. I don't think I 
have seen a study of this particular effect. OTOH, it will only matter 
if the DB usage pattern is sequential reads and lots of updates - and 
even here it might be hidden by internal DB data fragmentation.




OK, thanks for the answers, I ´ll study the efects now. This tests was 
with the FreeBSD-8.0 version?


No, AFAIK all of them were on some (and different) versions of 
(Open)Solaris.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any have tested ZFS like PostgreSQL installation filesystem?

2009-11-30 Thread Ivan Voras

Ing . Marcos Luís Ortíz Valmaseda wrote:

Regards to all the list.
ZFS, the new filesystem developed by the Solaris Development team and 
ported to FreeBSD too, have many advantages that can do that all 
sysadmins are questioned

about if it is a good filesystem to the PostgreSQL installation.
Any of you haved tested this filesystem like PostgreSQL installation fs?


It will work but as to if it is a good file system for databases, the 
debate still goes on.


Here are some links about ZFS and databases:

http://blogs.sun.com/paulvandenbogaard/entry/postgresql_on_ufs_versus_zfs
http://blogs.sun.com/paulvandenbogaard/entry/running_postgresql_on_zfs_file
http://blogs.sun.com/realneel/entry/mysql_innodb_zfs_best_practices
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/

A separate issue (I think it is not explored enough in the above links) 
is that ZFS writes data in a semi-continuous log, meaning there are no 
in-place modifications of files (every such write is made on a different 
place), which leads to heavy fragmentation. I don't think I have seen a 
study of this particular effect. OTOH, it will only matter if the DB 
usage pattern is sequential reads and lots of updates - and even here it 
might be hidden by internal DB data fragmentation.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Ivan Voras

Lists wrote:

Laszlo Nagy wrote:

Hello,

I'm about to buy SSD drive(s) for a database. For decision making, I 
used this tech report:


http://techreport.com/articles.x/16255/9
http://techreport.com/articles.x/16255/10

Here are my concerns:

   * I need at least 32GB disk space. So DRAM based SSD is not a real
 option. I would have to buy 8x4GB memory, costs a fortune. And
 then it would still not have redundancy.
   * I could buy two X25-E drives and have 32GB disk space, and some
 redundancy. This would cost about $1600, not counting the RAID
 controller. It is on the edge.
This was the solution I went with (4 drives in a raid 10 actually). Not 
a cheap solution, but the performance is amazing.


I've came across this article:

http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/

It's from a Linux MySQL user so it's a bit confusing but it looks like 
he has some reservations about performance vs reliability of the Intel 
drives - apparently they have their own write cache and when it's 
disabled performance drops sharply.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limiting performance impact of wal archiving.

2009-11-10 Thread Ivan Voras

Laurent Laborde wrote:

On Tue, Nov 10, 2009 at 3:05 PM, Ivan Voras  wrote:

Laurent Laborde wrote:

Hi !
We recently had a problem with wal archiving badly impacting the
performance of our postgresql master.

Hmmm, do you want to say that copying 16 MB files over the network (and
presumably you are not doing it absolutely continually - there are pauses
between log shipping - or you wouldn't be able to use bandwidth limiting) in
an age when desktop drives easily read 60 MB/s (and besides most of the file
should be cached by the OS anyway) is a problem for you? Slow hardware?

(or I've misunderstood the problem...)


Desktop drive can easily do 60MB/s in *sequential* read/write.


... and WAL files are big sequential chunks of data :)


We use high performance array of 15.000rpm SAS disk on an octocore
32GB and IO is always a problem.

I explain the problem :

This server (doing wal archiving) is the master node of the
over-blog's server farm.
hundreds of GB of data, tens of millions of articles and comments,
millions of user, ...
~250 read/write sql requests per seconds for the master
~500 read sql request per slave.

Awefully random access overload our array at 10MB/s at best.


Ok, this explains it. It also means you are probably not getting much 
runtime performance benefits from the logging and should think about 
moving the logs to different drive(s), among other things because...



Of course, when doing sequential read it goes to +250MB/s :)


... it means you cannot dedicate 0.064 of second from the array to read 
through a single log file without your other transactions suffering.



Waiting for "cheap" memory to be cheap enough to have 512Go of ram per server ;)

We tought about SSD.
But interleaved read/write kill any SSD performance and is not better
than SSD. Just more expensive with an unknown behaviour over age.


Yes, this is the current attitude toward them.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limiting performance impact of wal archiving.

2009-11-10 Thread Ivan Voras

Laurent Laborde wrote:

Hi !
We recently had a problem with wal archiving badly impacting the
performance of our postgresql master.


Hmmm, do you want to say that copying 16 MB files over the network (and 
presumably you are not doing it absolutely continually - there are 
pauses between log shipping - or you wouldn't be able to use bandwidth 
limiting) in an age when desktop drives easily read 60 MB/s (and besides 
most of the file should be cached by the OS anyway) is a problem for 
you? Slow hardware?


(or I've misunderstood the problem...)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Free memory usage Sol10, 8.2.9

2009-11-04 Thread Ivan Voras

Jeremy Harris wrote:

On 11/03/2009 07:16 PM, Subbiah Stalin-XCGF84 wrote:

All,

I'm trying to understand the free memory usage and why it falls below
17G sometimes and what could be causing it. Any pointers would be
appreciated.

r...@prod1 # prtconf
System Configuration:  Sun Microsystems  sun4u
Memory size: 32768 Megabytes

[postg...@prod1 ~]$ vmstat 5 10
  kthr  memorypagedisk  faults
cpu
  r b w   swap  free  re  mf pi po fr de sr 1m 1m 1m m1   in   sy   cs us
sy id
  0 0 0 51713480 21130304 58 185 325 104 104 0 0 23 3 7 1 488 604  573  1
2 97
  0 0 0 51048768 18523456 6 10 0 192 192 0 0 4  0  3  0  527  753  807  2
1 97


Memory used by the OS for caching files is no longer free.
Free memory is wasted memory.


To finish the thought: memory used by OS for caching files will be 
automatically given to applications that need more memory so it is "kind 
of" free memory.


In your case, you really do have 17-18G unused memory which is 
practically wasted.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   >