I was wondering about a scale out problem.
Lets say you have a large table with 3 cols and 500+ million rows.

Would there be much benefit in splitting the columns into different tables
based on INT type primary keys across the tables? The split tables will be
hosted on a same physical instance but can be spread over multiple disks.
We're also open to splitting the query and reconstituting the data at the
application layer such as

select col1, col2 from t1 where col2='name';
select col2 from t2 where col1=t1.col1;
select col2 from t3 where col1=t1.col1;

as opposed to

select t1.col2, t2.col2, t3.col2 from t1 inner join t2 on t1.col1=t2.col1
inner join t3 on t1.col1=t3.col1;

My concern to this approach is the overhead of joins of such large number
of rows. I was doing some research into the cost of joins and as of 5.0,
the joins were still nested loop scans.

I was wondering if there are others with practical experience in this
matter and what they've found. Any feedback will be much appreciated.

Kyong
Inst. Web Programmer
CMDBA 5.0

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to