On 06/01/18 04:41 , Abroży Nieprzełoży wrote:
> 2018-06-01 2:12 GMT+02:00, Torsten Curdt <tcu...@vafer.org>:
>> I need to get some random rows from a large(ish) table.
>>
>> The following seems to be the most straight forward - but not the fastest.
>>
>>   SELECT * FROM table ORDER BY random() limit 200
>>
>> Is there a faster/better approach?
> hmm... i think:
>
> begin transaction;
> query 'select count(*) from tab' to get total number of records, name it N
> randomly select 200 integers from the range 0 to N-1
> for each number query 'select * from tab limit 1 offset ?' with '?'
> bound to selected number
> end transaction;

Most likely, no. OFFSET is O(n), so this is O(n*m).
Sort is O(n*log(n)).

I'd probably consider this:

SELECT * FROM (
  SELECT * FROM foo
   WHERE RANDOM() < (
     SELECT (200.0/COUNT(*) - 0.5)*(16.0*1024*1024*1024*1024*1024*1024)
       FROM foo))
ORDER BY RANDOM();

But, obviously, it won't always return 200 rows (it takes random rows
with certain probability).

sqlite> CREATE TABLE foo (a,b blob,c,d);
sqlite> INSERT INTO foo VALUES (1,1,1,1);
sqlite> INSERT INTO foo
  SELECT RANDOM(), RANDOMBLOB(RANDOM() % 512), RANDOM(), RANDOM() FROM foo;
-- ... repeat several times ...
.timer on
sqlite> SELECT COUNT(*) FROM foo;
262144
Run Time: real 0.007 user 0.004000 sys 0.000000

sqlite> SELECT COUNT(*) FROM (
  SELECT * FROM foo ORDER BY RANDOM() LIMIT 200);
200
Run Time: real 1.381 user 1.368000 sys 0.012000

sqlite> SELECT COUNT(*) FROM (
  SELECT * FROM foo WHERE rowid IN (
    SELECT rowid FROM foo ORDER BY RANDOM() LIMIT 200));
200
Run Time: real 0.975 user 0.964000 sys 0.008000

sqlite> SELECT COUNT(*) FROM (
  SELECT * FROM (
    SELECT * FROM foo WHERE RANDOM() < (
      SELECT (200.0/COUNT(*) - 0.5)*(16.0*1024*1024*1024*1024*1024*1024)
        FROM foo))
  ORDER BY RANDOM());
198
Run Time: real 0.266 user 0.260000 sys 0.000000

(measured on sqlite3-3.19.3 on x86/32bit, YMMV; beware of query flattening).

Choose your poison.

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to