I didn't look through your code very carefully. Have you confirmed (using EXPLAIN) that your select query is using the index?
I don't much about the mysql optimizer, but it's possible that this query: > $query1="SELECT lat,lon from integer_test WHERE lat>$lat1 and lat<$lat2 > and lon>$lon1 and lon<$lon2"; Actually runs through the table four times instead of twice, and maybe can't even use the index for the whole query. Is the performance different if you use a subqery? Some thing like: SELECT id, lat, lon FROM integer_test WHERE lat>$lat1 and lat<$lat2 LEFT JOIN (SELECT id, lat, lon FROM integer_test AS i2 WHERE lon>$lon and lon<$lon ) ON integer_test.id = i2.id assuming that you have independent indicies on lat and lon. I didn't try this so the syntax may be wrong. You could also dispense with the id idea, but if so you should probably declare (lat,lon) as your primary key. A > and lon>$lon1 and lon<$lon2"; On Apr 23, Nick Hill wrote: > Hello > > I have been looking at planning the database strategy for openstreetmap > (http://www.openstreetmap.org). > > There are several data types stored in tables with longitude and latitude > columns. Select statements work by selecting > > where lat>$lat1 and lat<$lat2 and lon>$lon1 and lon<$lon2 > > I have made many empirical tests and have concluded: > > 1) I can improve performance by a factor of 2-2.5 by changing the double > lat/lon to an integer then selecting on an integer. > > 2) I have concluded that for each 10 fold increase in the number of records, > select queries take twice as long. For each doubling of the number of returned > records, there is a sqrt(2) increase in select query time. > > All this is assuming all relevant database information is in memory. > > > As the database grows, it would likely improve database performance by > splitting an individual table into several thousand tables using the file > system directory btree algorithm to effectively pre-select the data before the > query is handled to the MySQL engine. This is not a neat solution. A much > better way would be to improve the mysql index performance on very large > numbers of records. > > Given that there is such a strong relationship between the number of records > returned, and query time, I conclude that the whole index tree is matched for > every given number of root x records returned. If all records we are matching > are under a single node or under a small number of nodes in the index tree, > perhaps there is some way of telling the database engine to ignore the rest of > the index tree. > > Could this work, or am I misunderstanding how the index tree works? Are there > existing optimisations which can de-couple the relationship between number of > records and query time where the records I am selecting are within a small > range? > > > > Background information: > > We can boil all this down to a mathematical relationship where > query1 selects s number of records from r records dataset > and > query2 selects b number of records from c records dataset > > Tquery1 is time to execue query 1 and Tquery2 is time to execute query2. > > Tquery2=Tquery1 * sqrt(b/s) * (2^log(r/c)) + (b-s*CONST/15000)+CONST > Where for my processor, CONST is 0.03 > > > This can be simplified (loosing some accuracy) to: > > Tquery2=Tquery1 * sqrt(b/s) * (2^log(r/c) > > > > > Raw data for selects: > Creating a plan with 100000 points and averaging over 25 queries > Points_per_tile Query_Time > 25600 0.118 > 25600 0.119 > 25600 0.119 > 25600 0.119 > 12800 0.069 > 6400 0.042 > 3200 0.026 > 1600 0.017 > 800 0.011 > 400 0.008 > 200 0.005 > 100 0.004 > 50 0.003 > Creating a plan with 1000000 points and averaging over 25 queries > Points_per_tile Query_Time > 25600 0.224 > 25600 0.223 > 25600 0.222 > 25600 0.223 > 12800 0.145 > 6400 0.093 > 3200 0.062 > 1600 0.043 > 800 0.029 > 400 0.020 > 200 0.015 > 100 0.011 > 50 0.008 > Creating a plan with 10000000 points and averaging over 25 queries > Points_per_tile Query_Time > 25600 0.558 > 25600 0.548 > 25600 0.551 > 25600 0.551 > 12800 0.376 > 6400 0.257 > 3200 0.181 > 1600 0.125 > 800 0.087 > 400 0.062 > 200 0.044 > 100 0.031 > Creating a plan with 100000000 points and averaging over 25 queries > Points_per_tile Query_Time > 25600 2.422 > 25600 2.332 > 25600 2.493 > 25600 2.446 > 12800 1.769 > 6400 1.295 > 3200 0.866 > 1600 0.657 > 800 0.456 > 400 0.328 > 200 0.233 > 100 0.159 > 50 0.118 > > Source code for the above test: > #!/usr/bin/perl -w > > #Program creates random point fields eqyuivalent to bitfieldtest.pl except the > data is stored > #as regular signed integers. To represent the globe as closely as possible, > extents between > #-180 and +179.999999 will be used. Therefore, adding 180 normalises for > international date line 0. > #Prime Meridian 180. 111111**0.000001 > > > use DBI; > use Time::HiRes qw( usleep ualarm gettimeofday tv_interval ); > > $DBHOST = "localhost"; > $DBNAME = "nickh"; > $DBUSER = "nickh"; > $DBPASS = "xxxxxx"; > > #initialise database > $driver = "mysql"; > $dsn = "DBI:$driver:database=$DBNAME;host=$DBHOST"; > $dbh = DBI->connect($dsn, $DBUSER, $DBPASS); > > [EMAIL PROTECTED](100000000); > @plane_densities=(100000,1000000,10000000,100000000); > @tile_points=(25600,25600,25600,25600,12800,6400,3200,1600,800,400,200,100,50); > $query_iterations=25; > $debug=0; > > sub create_bitfield; > sub run_tests; > > foreach $density(@plane_densities){ > print "Creating a plan with $density points and averaging over > $query_iterations queries\nPoints_per_tile Query_Time\n"; > create_bitfield($density); > foreach $tilepoints(@tile_points){ > my $testtime=run_tests($density,$tilepoints); > printf '%-14d %.3f',$tilepoints,$testtime; # prints "<1.0>"print > "$density $testtime s\n"; > print "\n"; > } > } > > > > $dbh->disconnect; > exit 0; > > > sub create_bitfield{ > #takes number of points on the point field as argument. > #We first create table without an index, as the indes slow populating table > my $prep=q~ > DROP TABLE IF EXISTS integer_test; > > CREATE TABLE `integer_test` ( > `lat` integer NOT NULL default '0', > `lon` integer NOT NULL default '0' > ) TYPE=MyISAM > ~; > > #drop/create tables without index > foreach $aprepare(split(/;/,$prep)) { > #print "preparing $preparation"; > $dbh->do($aprepare); > } > > > #populate table > for ($i=0;$i<$_[0];$i+=100){ > #create 100 element batched inserts (value,value),(value,value) etc > my $insert=''; > for ($j=0;$j<100;$j++){ > if ($j>0) {$insert .= ','; } > $lat=int(rand 3599999999)-1800000000; > $lon=int(rand 3599999999)-1800000000; > $insert .= "($lat,$lon)"; > } > > my $sql="INSERT INTO `integer_test` VALUES $insert;"; > #print "SQL1 is $sql1\n\n"; > $dbh->do($sql); > > } > #After populating table, we create indexes. > #print "Creating index... This may take some time\n"; > my $sql='CREATE INDEX index1 ON integer_test (lat,lon);'; > $dbh->do($sql); > } > > > sub run_tests{ > #Parameters: Points in field; Size of tile in average number of points > my $number_of_points=$_[0]; > my $target_query_return=$_[1]; > my $proportion_of_each_axis=1/(sqrt($number_of_points/$target_query_return)); > my $max_extent=1-$proportion_of_each_axis; #Maximum extent for query without > extending beyond bound > my $returnedrows; > $query1total=0; > > for($i=0;$i<$query_iterations;$i++){ > #$lat1=(0.4+(rand ($max_extent/10))); > #$lon1=(0.4+(rand ($max_extent/10))); > > $lat1=int(rand ($max_extent*3599999999))-1800000000; > $lon1=int(rand ($max_extent*3599999999))-1800000000; > > $lat2=int($lat1+($proportion_of_each_axis*3599999999)); > $lon2=int($lon1+($proportion_of_each_axis*3599999999)); > > if ($debug){print "querying bounds $lat1 $lon1 $lat2 $lon2 with queries \n";} > > $query1="SELECT lat,lon from integer_test WHERE lat>$lat1 and lat<$lat2 and > lon>$lon1 and lon<$lon2"; > > > #print "Query 1: $query1\n"; > $t0 = Time::HiRes::time(); > my $sth = $dbh->prepare($query1); > $sth->execute(); > $returnedrows+=$sth->rows(); > #$sth->finish; > $elapsed=Time::HiRes::time()-$t0; > #print "Fetched $rows points in $elapsed \n"; > $query1total+=$elapsed; > > > } > #print $returnedrows; > if ($debug){print "Each of the $query_iterations returned an average " . > $returnedrows/$query_iterations . "records\n";} > #print "unit test time is " . ($query1total/$query_iterations) . "seconds > \n\n"; > return ($query1total/$query_iterations); > } > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]