I don't think that he is worried about table scanning, he is worried about
ROW scanning. Each of his rows is so large (2500*(size of float) +
3*(size of tinyint) + some other stuff) that just moving that much data
around through his machine is consuming too much time.
If you have a query that does something like this
SELECT onefield
FROM hugetable
WHERE indexed_column = 12
AND non_indexed = 6
Odds are that you will be doing an indexed search. Say that for his
indexed_column he gets 24M rows that match its condition out of the 16B
rows on his table (this is way less than 30%). The next thing the engine
has to do is to LOAD each of those 24M rows (every single column) into
memory so that the value of the non_indexed column can be compared to 6.
In order to perform that second comparison, the memory bus, the hard
drives, and anything else related to reading records will have to transfer
275.4 GB of data (at least once) just so that he can get the value from
the 1 column he specified in his SELECT statement out of each row that
matches his two WHERE conditions.
My idea is to create a way to ask the engine check the value of the second
field directly from the DISK copy of each table (without moving each row
into local memory) . If it matches a second seek is performed to pull in
the field(s) specified in his select . Alternatively we could create some
way that we can ask the engine to only pull those columns through memory
that either participate in the SELECT clause or one of the other ON or
WHERE conditions. This way we minimize how much data must be moved
through memory to resolve one of these rather bulky queries.
We could potentially create a new threshold value, say... if you use less
than 10% of the fields on a table in a query , that would cause this
optimization to kick in.
If anyone else has any ideas (and I am sure there are many) on how to
minimize disk traffic and memory throughput in order to handle his rather
large dataset, this would be a great time to speak up. I know that this is
a rather extreme case but solving this issue may make MySQL just that much
faster for the rest of us.
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
"Dathan Vance Pattishall" <[EMAIL PROTECTED]> wrote on 10/21/2004
04:00:10 PM:
>
>
> DVP
> ----
> Dathan Vance Pattishall http://www.friendster.com
>
>
> > -----Original Message-----
> >
> > So, is there a faster way to insert/index the data? Would a different
> > table or
> > index type improve performace?
>
> Use Load data from infile .. IGNORE ... u might get a better insert
speed
> increase. A different table and a different index / compound index would
> improve performance from an insert to a select perspective.
>
>
>
> >
> > above query, according to explain), the current behavior makes it
reads
> > 29548800 complete rows, which is 275.4 GB of data to read, even though
the
> > desired return is about 1/2500th of that (112.7 MB).
>
> If a range covers more then 30% of the table a table scan is performed,
> instead of an index scan.
>
>
> >
> >
> > Any/all suggestions, comments, even flames are welcoome :) Thanks in
> > advance!
> >
> > ken
>
> Look at some my.cnf options. You can tell mysql to use keys more often
the
> table scans with a var called max_seeks_keys=100 // something like that
>
>
>
> >
==========================================================================
> > =
> > "Diplomacy is the weapon of the Civilized Warrior"
> > - Hun, A.T.
> >
> > Ken Gieselman
> > [EMAIL PROTECTED]
> > System Administrator
> > http://www.endlessknot.com/~kgieselm
> > Endlessknot Communications
> > http://www.endlessknot.com
> >
==========================================================================
> > =
> >
> >
> >
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:
> > http://lists.mysql.com/[EMAIL PROTECTED]
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>