On Wed, Jun 25, 2014 at 3:48 AM, Niels Kristian Schjødt
<nielskrist...@autouncle.com> wrote:
> Hi,
> I’m running a search engine for cars. It’s backed by a postgresql 9.3 
> installation.
>
> Now I’m unsure about the best approach/strategy on doing index optimization 
> for the fronted search.
>
> The problem:
>
> The table containing the cars holds a around 1,5 million rows. People that 
> searches for cars needs different criteria to search by. Some search by 
> brand/model, some by year, some by mileage, some by price and some by special 
> equipment etc. etc. - and often they combine a whole bunch of criteria 
> together. Of cause some, like brand/mode and price, are used more frequently 
> than others. In total we offer: 9 category criteria like brand/model or body 
> type, plus 5 numeric criteria like price or mileage, plus 12 boolean criteria 
> like equipment. Lastly people can order the results by different columns 
> (year, price, mileage and a score we create about the cars). By default we 
> order by our own generated score.
>
> What I’ve done so far:
>
> I have analyzed the usage of the criteria “lightly”, and created a few 
> indexes (10). Among those, are e.g. indexes on price, mileage and a combined 
> index on brand/model. Since we are only interested in showing results for 
> cars which is actually for sale, the indexes are made as partial indexes on a 
> sales state column.
>
> Questions:
>
> 1. How would you go about analyzing and determining what columns should be 
> indexed, and how?

mainly frequency of access.

> 2. What is the best strategy when optimizing indexes for searches happening 
> on 20 + columns, where the use and the combinations varies a lot? (To just 
> index everything, to index some of the columns, to do combined indexes, to 
> only do single column indexes etc. etc.)

don't make 20 indexes.   consider installing pg_trgm (for optimized
LIKE searching) or hstore (for optmized key value searching) and then
using GIST/GIN for multiple attribute search.  with 9.4 we have
another fancy technique to explore: jsonb searching via GIST/GIN.

> 3. I expect that it does not make sense to index all columns?

well, maybe.  if you only ever search one column at a time, then it
might make sense.  but if you need to search arbitrary criteria and
frequently combine a large number, then no -- particularly if your
dataset is very large and individual criteria are not very selective.

> 4. I expect it does not make sense to index boolean columns?

in general, no.  an important exception is if you are only interested
in true or false and the number of records that have that interesting
value is tiny relative to the size of the table.  in that case, a
partial index can be used for massive optimization.

> 5. Is it better to do a combined index on 5 frequently used columns rather 
> than having individual indexes on each of them?

Only if you search those 5 columns together a significant portion of the time.

> 6. Would it be a goof idea to have all indexes sorted by my default sorting?

index order rarely matters.  if you always search values backwards and
the table is very large you may want to consider it.  unfortunately
this often doesn't work for composite indexes so sometimes we must
explore the old school technique of reversing the value.

> 7. Do you have so experiences with other approaches that could greatly 
> improve performance (e.g. forcing indexes to stay in memory etc.)?

as noted above, fancy indexing is the first place to look.   start
with pg_trgm (for like optmization), hstore, and the new json stuff.
the big limitation you will hit is that that most index strategies, at
least fo the prepackaged stuff will support '=', or partial string
(particularly with pg_trgm like), but not > or <: for range operations
you have to post process the search or try to work the index from
another angle.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to