Re: Most efficient way of handling a large dataset

Jim Lyons Fri, 24 Oct 2008 05:07:31 -0700

You might consider adding qty to the index and have so your queries would be
satisfied with the index lookup alone, saving an extra step since the
database won't then go access the data (just the one field, qty).


You might also consider making all field non-null and, if you keep the
fields as character data, make them fixed length, not variable, even if it
wastes space.

Integer lookups are faster than character lookups and you can probably keep
conversions tables for most if not all of those fields in memory.

You might think about the query cache..  If you have it on but your lookups
aren't repeated often enough, the query cache won't be used.  If you don't
have it on, but a lot of your queries are repeated, then having it on will
help.

On Fri, Oct 24, 2008 at 5:59 AM, Mark Goodge <[EMAIL PROTECTED]> wrote:

> I'd appreciate some advice on how best to handle a biggish dataset
> consisting of around 5 million lines. At the moment, I have a single table
> consisting of four fields and one primary key:
>
> partcode varchar(20)
> region varchar(10)
> location varchar(50)
> qty int(11)
> PRIMARY KEY (partcode, region, location)
>
> The biggest variable is partcode, with around 80,000 distinct values. For
> statistical purposes, I need to be able to select a sum(qty) based on the
> other three fields (eg, "select sum(qty) from mytable where partcode ='x'
> and region = 'y' and location = 'z'") as well as generating a list of
> partcodes and total quantities in each region and location (eg, "select
> sum(qty), partcode from mytable where region = 'y' and location = 'z' group
> by partcode").
>
> The selection is done via a web-based interface. Unfortunately, it's too
> slow. So I want to be able to optimise it for faster access. Speed of
> updating is less crucial, as it isn't updated in real-time - the table gets
> updated by a nightly batch job that runs outside normal working hours (and,
> apart from the rare occasion when a location is added or removed, the only
> thing that changes is the value in qty).
>
> Does anyone have any suggestions? My initial thought is to replace the
> region and location varchar fields with int fields keyed to a separate list
> of region and location names. Would that help, or is there a better way?
>
> Mark
> --
> http://mark.goodge.co.uk - my pointless blog
> http://www.good-stuff.co.uk - my less pointless stuff
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>
>


-- 
Jim Lyons
Web developer / Database administrator
http://www.weblyons.com

Re: Most efficient way of handling a large dataset

Reply via email to