date:20140625

[PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

2014-06-25 Thread Niels Kristian Schjødt

Hi,
I’m running a search engine for cars. It’s backed by a postgresql 9.3 
installation. 

Now I’m unsure about the best approach/strategy on doing index optimization for 
the fronted search.

The problem:

The table containing the cars holds a around 1,5 million rows. People that 
searches for cars needs different criteria to search by. Some search by 
brand/model, some by year, some by mileage, some by price and some by special 
equipment etc. etc. - and often they combine a whole bunch of criteria 
together. Of cause some, like brand/mode and price, are used more frequently 
than others. In total we offer: 9 category criteria like brand/model or body 
type, plus 5 numeric criteria like price or mileage, plus 12 boolean criteria 
like equipment. Lastly people can order the results by different columns (year, 
price, mileage and a score we create about the cars). By default we order by 
our own generated score.

What I’ve done so far:

I have analyzed the usage of the criteria “lightly”, and created a few indexes 
(10). Among those, are e.g. indexes on price, mileage and a combined index on 
brand/model. Since we are only interested in showing results for cars which is 
actually for sale, the indexes are made as partial indexes on a sales state 
column.

Questions: 

1. How would you go about analyzing and determining what columns should be 
indexed, and how?
2. What is the best strategy when optimizing indexes for searches happening on 
20 + columns, where the use and the combinations varies a lot? (To just index 
everything, to index some of the columns, to do combined indexes, to only do 
single column indexes etc. etc.)
3. I expect that it does not make sense to index all columns?
4. I expect it does not make sense to index boolean columns?
5. Is it better to do a combined index on 5 frequently used columns rather than 
having individual indexes on each of them?
6. Would it be a goof idea to have all indexes sorted by my default sorting?
7. Do you have so experiences with other approaches that could greatly improve 
performance (e.g. forcing indexes to stay in memory etc.)?




 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] how to improve perf of 131MM row table?

2014-06-25 Thread AJ Weber


Sorry for the semi-newbie question...

I have a relatively sizable postgresql 9.0.2 DB with a few large tables 
(keep in mind "large" is relative, I'm sure there are plenty larger out 
there).


One of my queries that seems to be bogging-down performance is a join 
between two tables on each of their BIGINT PK's (so they have default 
unique constraint/PK indexes on them).  One table is a detail table for 
the other.  The "master" has about 6mm rows.  The detail table has about 
131mm rows (table size = 17GB, index size = 16GB).


I unfortunately have limited disks, so I can't actually move to multiple 
spindles, but wonder if there is anything I can do (should I partition 
the data, etc.) to improve performance?  Maybe some further tuning to my 
.conf, but I do think that's using as much mem as I can spare right now 
(happy to send it along if it would help).


DB is vacuumed nightly with stats updates enabled.  I can send the 
statistics info listed in pgAdmin tab if that would help.


Any suggestions, tips, tricks, links, etc. are welcomed!

Thanks in advance,
AJ



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to improve perf of 131MM row table?

2014-06-25 Thread Shaun Thomas


On 06/25/2014 03:10 PM, AJ Weber wrote:


I have a relatively sizable postgresql 9.0.2 DB with a few large tables
(keep in mind "large" is relative, I'm sure there are plenty larger out
there).


Regardless of any help we might offer regarding this, you need to 
upgrade your installation to 9.0.17. You are behind by several 
performance, security, and integrity bugfixes, some of which address 
critical corruption bugs related to replication.



One of my queries that seems to be bogging-down performance is a join
between two tables on each of their BIGINT PK's (so they have default
unique constraint/PK indexes on them).  One table is a detail table for
the other.


This isn't enough information. Just knowing the relative sizes of the 
tables doesn't tell us which columns are indexed, whether or not the 
query is using those indexes, how many rows usually match, which queries 
are performing badly, and so on.


Please refer to this page to ask performance related questions:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Without much of this information, we'd only be speculating.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to improve perf of 131MM row table?

2014-06-25 Thread Aaron Weber

I will gather the other data tonight. Thank you. 

In the meantime, I guess I wasn't clear about some other particulars 
The query's where clause is only an "IN", with a list of id's (those I 
mentioned are the PK), and the join is explicitly on the PK (so, indexed). 

Thus, there should be only the explicit matches to the in clause returned, and 
if postgresql isn't using the unique index on that column, I would be very 
shocked (to the point I would suggest there is a bug somewhere). 

An IN with 50 int values took 23sec to return (by way of example). 

Thanks again. 
-- 
Aaron

On June 25, 2014 4:49:16 PM EDT, Shaun Thomas  wrote:
>On 06/25/2014 03:10 PM, AJ Weber wrote:
>
>> I have a relatively sizable postgresql 9.0.2 DB with a few large
>tables
>> (keep in mind "large" is relative, I'm sure there are plenty larger
>out
>> there).
>
>Regardless of any help we might offer regarding this, you need to 
>upgrade your installation to 9.0.17. You are behind by several 
>performance, security, and integrity bugfixes, some of which address 
>critical corruption bugs related to replication.
>
>> One of my queries that seems to be bogging-down performance is a join
>> between two tables on each of their BIGINT PK's (so they have default
>> unique constraint/PK indexes on them).  One table is a detail table
>for
>> the other.
>
>This isn't enough information. Just knowing the relative sizes of the 
>tables doesn't tell us which columns are indexed, whether or not the 
>query is using those indexes, how many rows usually match, which
>queries 
>are performing badly, and so on.
>
>Please refer to this page to ask performance related questions:
>
>https://wiki.postgresql.org/wiki/Slow_Query_Questions
>
>Without much of this information, we'd only be speculating.
>
>-- 
>Shaun Thomas
>OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL,
>60604
>312-676-8870
>stho...@optionshouse.com
>
>__
>
>See http://www.peak6.com/email_disclaimer/ for terms and conditions
>related to this email

Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

2014-06-25 Thread Merlin Moncure

On Wed, Jun 25, 2014 at 3:48 AM, Niels Kristian Schjødt
 wrote:
> Hi,
> I’m running a search engine for cars. It’s backed by a postgresql 9.3 
> installation.
>
> Now I’m unsure about the best approach/strategy on doing index optimization 
> for the fronted search.
>
> The problem:
>
> The table containing the cars holds a around 1,5 million rows. People that 
> searches for cars needs different criteria to search by. Some search by 
> brand/model, some by year, some by mileage, some by price and some by special 
> equipment etc. etc. - and often they combine a whole bunch of criteria 
> together. Of cause some, like brand/mode and price, are used more frequently 
> than others. In total we offer: 9 category criteria like brand/model or body 
> type, plus 5 numeric criteria like price or mileage, plus 12 boolean criteria 
> like equipment. Lastly people can order the results by different columns 
> (year, price, mileage and a score we create about the cars). By default we 
> order by our own generated score.
>
> What I’ve done so far:
>
> I have analyzed the usage of the criteria “lightly”, and created a few 
> indexes (10). Among those, are e.g. indexes on price, mileage and a combined 
> index on brand/model. Since we are only interested in showing results for 
> cars which is actually for sale, the indexes are made as partial indexes on a 
> sales state column.
>
> Questions:
>
> 1. How would you go about analyzing and determining what columns should be 
> indexed, and how?

mainly frequency of access.

> 2. What is the best strategy when optimizing indexes for searches happening 
> on 20 + columns, where the use and the combinations varies a lot? (To just 
> index everything, to index some of the columns, to do combined indexes, to 
> only do single column indexes etc. etc.)

don't make 20 indexes.   consider installing pg_trgm (for optimized
LIKE searching) or hstore (for optmized key value searching) and then
using GIST/GIN for multiple attribute search.  with 9.4 we have
another fancy technique to explore: jsonb searching via GIST/GIN.

> 3. I expect that it does not make sense to index all columns?

well, maybe.  if you only ever search one column at a time, then it
might make sense.  but if you need to search arbitrary criteria and
frequently combine a large number, then no -- particularly if your
dataset is very large and individual criteria are not very selective.

> 4. I expect it does not make sense to index boolean columns?

in general, no.  an important exception is if you are only interested
in true or false and the number of records that have that interesting
value is tiny relative to the size of the table.  in that case, a
partial index can be used for massive optimization.

> 5. Is it better to do a combined index on 5 frequently used columns rather 
> than having individual indexes on each of them?

Only if you search those 5 columns together a significant portion of the time.

> 6. Would it be a goof idea to have all indexes sorted by my default sorting?

index order rarely matters.  if you always search values backwards and
the table is very large you may want to consider it.  unfortunately
this often doesn't work for composite indexes so sometimes we must
explore the old school technique of reversing the value.

> 7. Do you have so experiences with other approaches that could greatly 
> improve performance (e.g. forcing indexes to stay in memory etc.)?

as noted above, fancy indexing is the first place to look.   start
with pg_trgm (for like optmization), hstore, and the new json stuff.
the big limitation you will hit is that that most index strategies, at
least fo the prepackaged stuff will support '=', or partial string
(particularly with pg_trgm like), but not > or <: for range operations
you have to post process the search or try to work the index from
another angle.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to improve perf of 131MM row table?

2014-06-25 Thread Shaun Thomas


On 06/25/2014 04:40 PM, Aaron Weber wrote:


In the meantime, I guess I wasn't clear about some other particulars
The query's where clause is only an "IN", with a list of id's (those
I mentioned are the PK), and the join is explicitly on the PK (so,
indexed).


Indexed doesn't mean indexed if the wrong datatypes are used. We need to 
see the table and index definitions, and a sample query with EXPLAIN 
ANALYZE output.



An IN with 50 int values took 23sec to return (by way of example).


To me, this sounds like a sequence scan, or one of your key matches so 
many rows, the random seeks are throwing off your performance. Of 
course, I can't confirm that without EXPLAIN output.


--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to improve perf of 131MM row table?

2014-06-25 Thread Aaron Weber

Will get what you asked for ASAP. Thanks for your time. 
-- 
Aaron

On June 25, 2014 5:55:29 PM EDT, Shaun Thomas  wrote:
>On 06/25/2014 04:40 PM, Aaron Weber wrote:
>
>> In the meantime, I guess I wasn't clear about some other particulars
>> The query's where clause is only an "IN", with a list of id's (those
>> I mentioned are the PK), and the join is explicitly on the PK (so,
>> indexed).
>
>Indexed doesn't mean indexed if the wrong datatypes are used. We need
>to 
>see the table and index definitions, and a sample query with EXPLAIN 
>ANALYZE output.
>
>> An IN with 50 int values took 23sec to return (by way of example).
>
>To me, this sounds like a sequence scan, or one of your key matches so 
>many rows, the random seeks are throwing off your performance. Of 
>course, I can't confirm that without EXPLAIN output.
>
>-- 
>Shaun Thomas
>OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL,
>60604
>312-676-8870
>stho...@optionshouse.com
>
>__
>
>See http://www.peak6.com/email_disclaimer/ for terms and conditions
>related to this email

[PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

[PERFORM] how to improve perf of 131MM row table?

Re: [PERFORM] how to improve perf of 131MM row table?

Re: [PERFORM] how to improve perf of 131MM row table?

Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

Re: [PERFORM] how to improve perf of 131MM row table?

Re: [PERFORM] how to improve perf of 131MM row table?

7 matches

Site Navigation

Mail list logo

Footer information