[sqlite] Efficient random sampling in a large table using builtin functions.
I have a table with millions of records. When I run a query with ORDER BY random() LIMIT N; the RANDOM() function is evaluated against all rows of my result set, then sorting occurs, and as a result the query is slow. In this case the query could be rewritten as: * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM the_table_name). Maybe using remainder operator % and builtin ABS() and RANDOM() functions can help (see below) * SELECT FROM the_table WHERE rowid IN (those random numbers) For the moment the most simple query I can think of is: SELECT * FROM my_table WHERE rowid IN ( SELECT 1 + (ABS(r.x) % c.num_rows) FROM (SELECT COUNT(*) AS num_rows FROM my_table ) AS c CROSS JOIN (SELECT random() x FROM my_table LIMIT N ) AS r ); This can however return less than N rows if by chance two random numbers have the same remainder modulo COUNT(*) FROM my_table. Note that the generation of N random numbers is quick because there is no ORDER BY involved that would require computation of as many random numbers as there are rows in the original table. This could maybe be optimized inside SQLite if those conditions are met: * random() hasn't been overridden with sqlite_create_function() nor sqlite_create_function_v2() * the user SELECTs FROM a table without joins and w/o WHERE conditions. -- Benoit Mortgat ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficient random sampling in a large table using builtinfunctions.
On Thu, Mar 8, 2012 at 15:02, Igor Tandetnik itandet...@mvps.org wrote: Benoit Mortgat mort...@gmail.com wrote: * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM the_table_name). Maybe using remainder operator % and builtin ABS() and RANDOM() functions can help (see below) * SELECT FROM the_table WHERE rowid IN (those random numbers) That is not equivalent. First, duplicate random numbers may be generated. Second, rowid values are not necessarily sequential. Thus, you may end up with fewer than N rows. Oh, I assumed rowids were sequential. If that's not the case, then you're right and my query is bad. Thanks for pointing that out. You can implement this algorithm in your application's code, if you are so inclined. Still, what I told is that I use that query only to get a random sample for displaying. I do that from inside the SQLite shell, for vizualization purposes, and as soon as I get approximately N results, this is satisfactory to me. I am not integrating anything into an application. -- Benoit Mortgat ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] compressing BLOB
SQLite does not compress your blob and you will have to do that programatically. However you can define your own with sqlite_create_function_v2(): the prototype of your function would be void compress(sqlite3_context *context, int argc, sqlite3_value **argv) { assert(argc==1); void *data = sqlite3_value_blob(argv[0]); int nBytes = sqlite3_value_bytes(argv[0]); // allocate memory for result . sqlite3_result_blob(...); } On Fri, Mar 2, 2012 at 10:42, Christoph P.U. Kukulies k...@kukulies.org wrote: Since I'm inserting large files into the DB I'm wondering whether Sqlite can do compression on the data BLOB by itself or whether I should do that by programming when creating the BLOB? -- Christoph Kukulies ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite shell dot commands
On Fri, Feb 24, 2012 at 13:28, Don V Nielsen donvniel...@gmail.com wrote: Please, I don't mean this to be offensive. I'm not. Thanks for the answer, I did not feel offended. It was suggested that the syntax [Ben's table] is cumbersome. What is really cumbersome, in my opinion, is the table name itself. The table name includes an white space (space) and a delimiting character (apostrophe.) The simple table name has lots of junk in it that will throw a left hook to many language parsers. To simplify everything, just name the table BensTable. Camel case the words, drop the spaces. Everything is still very readable and makes sense. I fully agree that it's not really advisable to name a table like this. Still, since SQLite supports non-\w+ table names, I felt that the SQLite shell should also support them in meta commands. If you re-read the original question, there is still the problem when it's file names that contain spaces, apostrophes or other challenging characters (commands affected could be: .import, .backup, .restore, .load, .log, .output). Those cases are not that uncommon. If it's not planned to rewrite the argument tokenizer for meta-commands could it be possible at least to specify in the output of .help how those arguments are tokenized? (Note: another workaround to import into Ben's table is using the octal digits escaping with backslashes:) .import my_file \042Ben's\040table\042 -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite shell dot commands
On Fri, Feb 24, 2012 at 16:22, Simon Slavin slav...@bigfraud.org wrote: It is faster, simpler, and would introduce far fewer ambiguities and opportunities for bugs, simply to remove the ability to create tables with whacky names. There are no real restrictions on table names in the SQL specs. You can even theoretically create a table name with a '' or a ']' character in, if you can make the parser accepting one. So every implementation of SQL has its own peculiarities. Sorry but the question is not about what an acceptable table name is. It is about how dot-commands in the SQLite shell could accept arguments containing special characters (I recall that those arguments can be table names, file names or various others: boolean, fixed hardcoded values: `.help` in the SQLite shell lists them, as well as reading shell.c.) Please do not convert my thread to a debate about what characters table names should include. -- Benoit Mortgat ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite shell dot commands
I have a database with a table name containing a quote and a space. Let's say it's called “Ben's table” I have created it with: CREATE TABLE Ben's table ([column_spec]); I tried the following in the SQLite shell: .import 'file_name.txt' Ben's table But that does not work. So I had to dig a bit into shell.c to understand, and I found the following: * inside the do_meta_command function, the arguments to dot-commands are tokenized with the following rules: * The tokenizer skips all whitespace character when finding the start for the next token. * When ' or is found where a token should start, the token will be the portion of text between that delimiter and its next occurrence. There is no possibility to escape the delimiter. The two delimiters are discarded. * If any other character is found where a token should start, the token ends at whitespace. * Backslashes get special processing except for tokens that were delimited by the single quote. * inside the same do_meta_command method, when processing the import command, the name of the table is appended to some queries this way: sqlite3_snprintf(nByte+20, zSql, INSERT INTO %s VALUES(?, zTable); So, my .import command was tokenized that way: Token 1: file_name.txt Token 2: Ben's table --- note: double quotes gone. And the SQL query that was forged was incorrect: INSERT INTO Ben's table VALUES(?... The only way I found in order to have my import succeed was: .import 'file_name.txt' [Ben's table] which I find cumbersome. Are there any plans to improve the tokenizer to enable escaping the delimiter (doubling the single quote or backslash-escaping the double quote), and to escape the table name in the dot-commands? Potential problem: how to use .backup or .restore with a file name that contains both ' and space? Thanks. -- Benoit Mortgat ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] double_quote(text_field)
The same, escaping double quotes inside the double quotes: SELECT '' || replace(replace(text_field,'\','\\'),'','\') || '' On Thu, Feb 23, 2012 at 15:16, Bill McCormick wpmccorm...@gmail.com wrote: Yes, that should work. Thanks!! Dave Watkinson wrote, On 2/22/2012 5:33 PM: Did you mean something like SELECT ''||text_field||'' FROM table; so that your column's data is quoted? -- Benoit Mortgat ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.7.5 : same query written using NOT IN and EXCEPT does not give out same results
On Tue, Feb 22, 2011 at 14:17, Richard Hipp d...@sqlite.org wrote: The query is really more like this: SELECT DISTINCT COALESCE(a.xxx, b.value) value FROM tbl1 a LEFT OUTER JOIN tbl2 b ON a.zzz = b.ttt WHERE value NOT IN ( SELECT DISTINCT ggg FROM tbl3 ); The value on the left-hand side of the NOT IN operator is ambiguous: Does it refer to the first column of output or to the value column of the b table? SQLite chooses the latter. Thank you for your answer. Benoit ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite 3.7.5 : same query written using NOT IN and EXCEPT does not give out same results
Hello, I have come across a strange behaviour of SQLite 3.7.5. The following query: SELECT DISTINCT COALESCE(a.xxx, b.yyy) value FROM tbl1 a LEFT OUTER JOIN tbl2 b ON a.zzz = b.ttt EXCEPT SELECT DISTINCT ggg value FROM tbl3; will not return any results (which seems to be correct). However, when I write it this way: SELECT DISTINCT COALESCE(a.xxx, b.yyy) value FROM tbl1 a LEFT OUTER JOIN tbl2 b ON a.zzz = b.ttt WHERE value NOT IN ( SELECT DISTINCT ggg FROM tbl3 ); I get results (which are wrong). I could send a samble database with full query to a developer if needed in order to reproduce that. Is this known bug? Thank you, Benoit ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] DB file locked when no other process using it
Do you have SQLite Manager extension for Firefox installed? I had problems with it having the database open until Firefox is restarted. On Tue, Sep 14, 2010 at 15:45, Andrew Wood ajw...@theiet.org wrote: Im getting an error saying the database is locked even though no other process is accessing the file. I think it was probably caused by a process crashing part way through. Is there a way to force release the lock? Andrew ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] what could be the reason that natural join stops working ?
On Tue, Sep 14, 2010 at 23:41, Stef Mientki stef.mien...@gmail.com wrote: until a few moments ago, this worked perfectly select Header from vraag natural join vraaglist where Nr = 0 and vraaglist.Name = 'eortc_br23' but now it returns an empty string (while the string shouldn't be empty). I test that by doing an explicit join select Header from vraag join vraaglist on vraag.vlid = vraaglist.vlid where Nr = 0 and vraaglist.Name = 'eortc_br23' Are you sure that after altering your tables adding columns, natural join still only joins on vlid? -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Upon table creation, multiple CHECK constraints do not work with FTS3
The 1 expression was here only for simplification, but it also fails to run the statement whatever expressions are specified. Thank you for your answer. I now am aware that checks must be done by the upper-level application. Benoit On Mon, Aug 30, 2010 at 08:19, Dan Kennedy danielk1...@gmail.com wrote: On Aug 30, 2010, at 12:22 PM, Benoit Mortgat wrote: Hello, The following table creation fails under latest release : CREATE VIRTUAL TABLE foo USING FTS3 ( bar TEXT, othercolumns TEXT, CHECK(1), CHECK(1) ); Error message: vtable constructor failed Maybe it thinks the two CHECK(1) identifiers represent a duplicate column name. CHECK constraints will not work with either fts3 or rtree virtual tables. If they are parsed at all, they will probably not do what you are expecting. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Upon table creation, multiple CHECK constraints do not work with FTS3
Hello, The following table creation fails under latest release : CREATE VIRTUAL TABLE foo USING FTS3 ( bar TEXT, othercolumns TEXT, CHECK(1), CHECK(1) ); Error message: vtable constructor failed It is possible to bypass that error by concatenating the CHECKS with AND logic, which is what SQLite normally does according to http://osdir.com/ml/sqlite-users/2009-04/msg00279.html using the R*Tree module this syntax works, so the bug should be in the FTS3 module. Thank you, Benoit -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to update many rows efficiently?
On Thu, Aug 5, 2010 at 01:17, Igor Tandetnik itandet...@mvps.org wrote: Or, if t1.ID is a primary key or otherwise has a unique constraint: insert or replace into t1(ID, name) select ID, name from t2; this one is different because it would cause INSERTs into t1 if some ID exists in t2 and not t1. -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to update many rows efficiently?
2010/8/5 Dominique Pellé dominique.pe...@gmail.com: Using information in previous reply, I can do it with 2 UPDATE queries as follows (but I suspect that there is a better solution). UPDATE t1 SET l_nm = (SELECT l_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK) WHERE ID_PK IN (SELECT ID_FK FROM t2); UPDATE t1 SET r_nm = (SELECT r_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK) WHERE ID_PK IN (SELECT ID_FK FROM t2); For this kind of statements you can use either: UPDATE t1 SET l_nm = (SELECT l_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK), r_nm = (SELECT r_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK) WHERE ID_PK IN (SELECT ID_FK FROM t2); or INSERT OR REPLACE INTO t1(ID_PK, l_nm, r_nm) SELECT t1.ID_PK, t2.l_nm, t2.r_nm FROM t1 INNER JOIN t2 ON ( t2.ID_FK = t1.ID_PK ) Including t1 in the select statement above is necessary in order not to add rows, but only keep existing ones, and id_pk must be declared as primary key. Regards, -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] count distinct group by
On Thu, Jul 29, 2010 at 00:47, Igor Tandetnik itandet...@mvps.org wrote: Peng Yu pengyu...@gmail.com wrote: I want to select only the rows where if S column are the same, R1 column is the same and R2 column is the same. select * from mytable where s in (select s from mytable group by s having min(r1)=max(r1) and min(r2)=max(r2) ); Another possibility: having count(distinct r1) = 1 and count(distinct r2) = 1 -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query critique
On Fri, Jul 9, 2010 at 11:08, Ian Hardingham i...@omroth.com wrote: Hey guys. I have a query which is very slow, and was wondering if there was any advice you guys had on it. Here are two table definitions: CREATE TABLE IF NOT EXISTS globalRankingTable (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL UNIQUE, ranking TEXT, score REAL, record TEXT); CREATE TABLE IF NOT EXISTS friendTable (id INTEGER PRIMARY KEY AUTOINCREMENT, simpleId TEXT NOT NULL UNIQUE, player TEXT, friend TEXT); And here is my query (written in a script language): db.query(SELECT * FROM (SELECT DISTINCT * FROM globalRankingTable WHERE upper(name) = upper('?') OR id ? union all SELECT a.* FROM globalRankingTable a, friendTable b WHERE upper(b.player) = upper('?') AND upper(b.friend) = upper(a.name)) ORDER BY score DESC, 0, %client.username, %globId, %client.username); Create an index either on player column or friend column in your second table. CREATE INDEX i_friendTable ON friendTable (player ASC); -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite lets selecting a column that is not in the Group by clause
Hello I am running the last version on SQLite (precompiled 3.6.23.1 binary release for Windows as found on http://www.sqlite.org/sqlite-3_6_23_1.zip) command line. Here is the input: create temporary table foo(col1 text, col2 text); insert into foo values('a','b'); insert into foo values('a','c'); insert into foo values('b','d'); insert into foo values('b','e'); .mode select col1, col2 from foo group by col1; a|c b|e As you can see, that last query does not result in any error, however col2 should not be selectable if not in the group by clause? Is this a feature, a known bug or an unknown one? If this is a feature, is there a pragma to forbid such a syntax? Have a nice day -- Benoit Mortgat 20, avenue Marcel Paul 69200 Vénissieux, France +33 6 17 15 41 58 +33 4 27 11 61 23 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users