Hi Justin,

> I'm after a little help/advice with indexing...
> I have a table for measuring hits on each page within a site:
> year month page hits
>
> basically, my php script looks for a row matching this year (2002), this
> month (12) and this page (something.php).  if it finds it, increment by 1,
> otherwise insert a row.
>
> All is working fine.
> However, I'm not sure what column(s) would be best for indexing... this
> small PHP script gets called on every hit of every page (with one or two
> queries), so I'd like to optimise it in any way I can, finding a balance
> between
> a) quickly finding out if the row already exists
> b) quickly inserting the row if it doesn't
> I'm not worried about how long reporting the stats takes.
> Or am I worried about nothing?


The purpose of an index is to speed up... because the hit-count query is
only looking for one page-addr out of a collection, indexing that column
will speed up both retrieval and UPDATE.

An index will slow down certain transactions, eg INSERT, because not only
does data have to be added to the tbl, but an new index entry must be made
as well. I can't imagine that this will represent a significant proportion
of the transactions you describe.

For reporting you could create a year/month index, but will it make much
difference/how much data are we talking about? If you have it, the only time
the hit-count will take a performance hit is the first time a page is
accessed each month - so I'll assume a non-issue.

On the other hand...
why keep all this data in one table. In the 'good old days' we not only kept
transaction data separated from 'old' records, but might even keep them in
separate DBMSes - for exactly the considerations you are discussing. So, any
data from Nov 2002 backwards has no place in the hit-count recording
process. It could be removed to a separate table.
Another view: do you need to provide stats that are up-to-the-minute/second,
or would data accurate to midnight-last-night be acceptable? Now you could
take a 'snapshot' every ?midnight and run analyses against the snapshot,
whilst the hit-count continues unabated and unaffected.

If the tables were separate, you could go for minimal indexing of the
transaction system, and index everything under the sun in the statistics
table!

You've read the manual, I assume. The CREATE INDEX and ALTER entries provide
better coverage than CREATE TABLE - follow a link or two from there. These
talk of index 'advantages' in and amongst the 'methods'. Somewhere there is
a discussion of index 'costs', but I've not found it in a quick perusal -
definitely covered in DuBois/MySQL.

Enough for round one?
=dn


> sql,query


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to