[sqlite] Equal distribution from random rows

2007-05-30 Thread Alex Teslik
Hello,

   I'm working on a project that requires random images to be chosen from a
live database for a dynamic homepage. I found this link from Dr. Hipp that
details a very fast approach to selecting random rows:

http://www.mail-archive.com/sqlite-users@sqlite.org/msg14652.html

   Unfortunately, it heavily weights the distribution of the randomly chosen
rows - to the extreme that the feeling of randomness is almost lost due to the
same images appearing so frequently.

   After some research I concluded that it must be caused by breaks in the
ROWID values. However, after VACUUMing the database to re-order the ROWIDs I
find I am still having the same problem - weighted distribution.

   Is there a way I can modify my query to attain a much more equal
distribution? It doesn't have to be perfect, but right now it is too
noticiably weighted.

   I have posted a stripped down version of the database, and a small snippet
of perl to display the problem here:

http://www.acatysmoof.com/posting/problems/sqlite/dist.tgz

   Many thanks to anyone with a moment to share some ideas on how to equalize
this distribution.

Thanks,
Alex

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Equal distribution from random rows

2007-05-30 Thread Alex Teslik
On Wed, 30 May 2007 15:18:18 -0400, John Elrick wrote
snip
 After running a simple test, I confirmed a suspicion.  VACUUM 
 doesn't reorder the ROWIDs, so you still have breaks.

My tests show otherwise:

[alex]# cat 01_vacuum_table_test.sql
CREATE TABLE foo (
  string varchar(1) not null
);
INSERT INTO foo (string) VALUES ('a');
INSERT INTO foo (string) VALUES ('b');
INSERT INTO foo (string) VALUES ('c');
INSERT INTO foo (string) VALUES ('d');
[alex]# sqlite3 foo.db  01_vacuum_table_test.sql
[alex]# sqlite3 --header --column foo.db SELECT rowid,* FROM foo
rowid   string
--  --
1   a
2   b
3   c
4   d
[alex]# sqlite3 --header --column foo.db DELETE FROM foo WHERE rowid = 2
[alex]# sqlite3 --header --column foo.db DELETE FROM foo WHERE rowid = 4
[alex]# sqlite3 --header --column foo.db SELECT rowid,* FROM foo
rowid   string
--  --
1   a
3   c
[alex]# sqlite3 --header --column foo.db VACUUM
[alex]# sqlite3 --header --column foo.db SELECT rowid,* FROM foo
rowid   string
--  --
1   a
2   c

did I do something incorrectly?

 
 Two suggestions:
 
 1.  If the image data are fairly stable, copy the table to a new 
 table so as to eliminate the breaks in the ROWIDs.
 2.  If the table is relatively small, load all the potential ids 
 into memory as a list and select randomly from the list.

The table is small now, but expected to grow considerably to over a million
rows. I coded a random selector from an array at the application level, and it
works, but it will degrade dramatically over time.

Thanks,
Alex


-
To unsubscribe, send email to [EMAIL PROTECTED]
-