Re: dynamically splitting a table through one full table scan

2009-03-30 Thread Jacek Becla

Hi Kabel

Yes, I did, it won't do the job for us. I didn't explain the whole 
usecase: we are dealing with a 50-billion row table which we want

to split into 1-million-row tables, and then dynamically break each
of these into smaller pieces in order to speed up n^2 near-neighbor
joins. If we partition the 1 million row table from the very beginning,
we will end up with unmanageable number of tables/files.

thanks,
Jacek



Have you looked into MySQL partitioning?  If you're using version 5.1, 
it might really help.. just partition the big table on chunk ID.


http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

kabel



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: dynamically splitting a table through one full table scan

2009-03-28 Thread kabel

Jacek Becla wrote:

Hi,

Is there a way to dynamically split a big table
into n smaller tables by doing a single scan of
the table that is being split? Here is more
details:

 * Suppose I have a million row MyISAM table X, with
   relatively small number of columns. It has
   a column "chunkId" with values between 1 and 100.

 * I need to split this table into 100 in-memory
   tables, essentially I need to do:
   INSERT INTO X001 SELECT * FROM X WHERE chunkId=1;
   INSERT INTO X002 SELECT * FROM X WHERE chunkId=2;
   INSERT INTO X003 SELECT * FROM X WHERE chunkId=3;
   and so on.

Based on the tests, each of these individual INSERTS
costs ~50% of a full table scan of X, even with
a clustered index on chunkId. The cost is totally
dominated by the "SELECT * FROM X WHERE chunkId="

Since we need to do such split many times, this
performance is not acceptable. It feels it should
be possible to do the split through a single scan.
Any suggestions?

thanks,
Jacek



Jacek,

Have you looked into MySQL partitioning?  If you're using version 5.1, 
it might really help.. just partition the big table on chunk ID.


http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

kabel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: dynamically splitting a table through one full table scan

2009-03-28 Thread Chris W
Why not write a simple script that scans every record and inserts them 
into the proper table one at a time?


In php for example..

$query  = "SELECT * \n";
$query .= "FROM `X`  \n";
$result = mysql_query($query);
while(($row = mysql_fetch_array($result, MYSQL_ASSOC))){
 $Values  = "";
 foreach($row as $TmpVar => $TmpValue){
   $$TmpVar = $TmpValue;
   $TmpValue = mysql_real_escape_string($TmpValue);
   $Values .= "'$TmpValue',";
 }
 $Values = substr($Values,0,-1);  // remove the last comma
 $Table = sprintf("x%03d",$chunkId);
 $query  = "INSERT INTO `$Table` \n";
 $query .= "VALUES($Values) \n";
 $Iresult = mysql_query($query);
}


Knowing all of the columns could allow you to hard code the insert query 
values clause and avoid the foreach loop which could make it slightly 
faster.  You would still need to use the mysql_real_escape_string 
function on any columns that contained something other than numbers.


Jacek Becla wrote:

Hi,

Is there a way to dynamically split a big table
into n smaller tables by doing a single scan of
the table that is being split? Here is more
details:

 * Suppose I have a million row MyISAM table X, with
   relatively small number of columns. It has
   a column "chunkId" with values between 1 and 100.

 * I need to split this table into 100 in-memory
   tables, essentially I need to do:
   INSERT INTO X001 SELECT * FROM X WHERE chunkId=1;
   INSERT INTO X002 SELECT * FROM X WHERE chunkId=2;
   INSERT INTO X003 SELECT * FROM X WHERE chunkId=3;
   and so on.

Based on the tests, each of these individual INSERTS
costs ~50% of a full table scan of X, even with
a clustered index on chunkId. The cost is totally
dominated by the "SELECT * FROM X WHERE chunkId="

Since we need to do such split many times, this
performance is not acceptable. It feels it should
be possible to do the split through a single scan.
Any suggestions?

thanks,
Jacek



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



dynamically splitting a table through one full table scan

2009-03-28 Thread Jacek Becla

Hi,

Is there a way to dynamically split a big table
into n smaller tables by doing a single scan of
the table that is being split? Here is more
details:

 * Suppose I have a million row MyISAM table X, with
   relatively small number of columns. It has
   a column "chunkId" with values between 1 and 100.

 * I need to split this table into 100 in-memory
   tables, essentially I need to do:
   INSERT INTO X001 SELECT * FROM X WHERE chunkId=1;
   INSERT INTO X002 SELECT * FROM X WHERE chunkId=2;
   INSERT INTO X003 SELECT * FROM X WHERE chunkId=3;
   and so on.

Based on the tests, each of these individual INSERTS
costs ~50% of a full table scan of X, even with
a clustered index on chunkId. The cost is totally
dominated by the "SELECT * FROM X WHERE chunkId="

Since we need to do such split many times, this
performance is not acceptable. It feels it should
be possible to do the split through a single scan.
Any suggestions?

thanks,
Jacek

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org