[sqlite] Efficient random sampling in a large table using builtin functions.

2012-03-08 Thread Benoit Mortgat
I have a table with millions of records.

When I run a query with

ORDER BY random() LIMIT N;

the RANDOM() function is evaluated against all rows of my result set,
then sorting occurs, and as a result the query is slow.

In this case the query could be rewritten as:

 * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM
   the_table_name). Maybe using remainder operator % and builtin ABS()
   and RANDOM() functions can help (see below)
 * SELECT FROM the_table WHERE rowid IN (those random numbers)

For the moment the most simple query I can think of is:

   SELECT *
 FROM my_table
WHERE rowid IN
( SELECT 1 + (ABS(r.x) % c.num_rows)
FROM (SELECT COUNT(*) AS num_rows
FROM my_table
 ) AS c
   CROSS JOIN
 (SELECT random() x
FROM my_table
   LIMIT N
 ) AS r
);

This can however return less than N rows if by chance two random numbers
have the same remainder modulo COUNT(*) FROM my_table.

Note that the generation of N random numbers is quick because there is
no ORDER BY involved that would require computation of as many random
numbers as there are rows in the original table.

This could maybe be optimized inside SQLite if those conditions are met:

 * random() hasn't been overridden with sqlite_create_function() nor
   sqlite_create_function_v2()
 * the user SELECTs FROM a table without joins and w/o WHERE conditions.

-- 
Benoit Mortgat
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient random sampling in a large table using builtinfunctions.

2012-03-08 Thread Benoit Mortgat
On Thu, Mar 8, 2012 at 15:02, Igor Tandetnik itandet...@mvps.org
wrote:
 Benoit Mortgat mort...@gmail.com wrote:
 * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM
   the_table_name). Maybe using remainder operator % and builtin ABS()
   and RANDOM() functions can help (see below) * SELECT FROM the_table
 WHERE rowid IN (those random numbers)

 That is not equivalent. First, duplicate random numbers may be
 generated. Second, rowid values are not necessarily sequential. Thus,
 you may end up with fewer than N rows.

Oh, I assumed rowids were sequential. If that's not the case, then
you're right and my query is bad. Thanks for pointing that out.

 You can implement this algorithm in your application's code, if you
 are so inclined.

Still, what I told is that I use that query only to get a random sample
for displaying. I do that from inside the SQLite shell, for
vizualization purposes, and as soon as I get approximately N results,
this is satisfactory to me. I am not integrating anything into an
application.

-- 
Benoit Mortgat
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] compressing BLOB

2012-03-02 Thread Benoit Mortgat
SQLite does not compress your blob and you will have to do that
programatically.  However you can define your own with
sqlite_create_function_v2(): the prototype of your function would be

void compress(sqlite3_context *context, int argc, sqlite3_value **argv)
{
   assert(argc==1);
   void *data = sqlite3_value_blob(argv[0]);
   int nBytes = sqlite3_value_bytes(argv[0]);

   // allocate memory for result
   .
   sqlite3_result_blob(...);
}

On Fri, Mar 2, 2012 at 10:42, Christoph P.U. Kukulies
k...@kukulies.org wrote:
 Since I'm inserting large files into the DB I'm wondering whether
 Sqlite can do compression on the data BLOB by itself or whether I
 should do that by programming when creating the BLOB?

 -- Christoph Kukulies ___
 sqlite-users mailing list sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite shell dot commands

2012-02-24 Thread Benoit Mortgat
On Fri, Feb 24, 2012 at 13:28, Don V Nielsen donvniel...@gmail.com wrote:
 Please, I don't mean this to be offensive.  I'm not.

Thanks for the answer, I did not feel offended.

 It was suggested that the syntax [Ben's table] is cumbersome.  What
 is really cumbersome, in my opinion, is the table name itself.  The
 table name includes an white space (space) and a delimiting character
 (apostrophe.)  The simple table name has lots of junk in it that will
 throw a left hook to many language parsers.
 To simplify everything, just name the table BensTable.  Camel case the
 words, drop the spaces.  Everything is still very readable and makes
 sense.

I fully agree that it's not really advisable to name a table like this.

Still, since SQLite supports non-\w+ table names, I felt that the
SQLite shell should also support them in meta commands.

If you re-read the original question, there is still the problem when
it's file names that contain spaces, apostrophes or other challenging
characters (commands affected could be: .import, .backup, .restore,
.load, .log, .output). Those cases are not that uncommon.

If it's not planned to rewrite the argument tokenizer for meta-commands
could it be possible at least to specify in the output of .help how
those arguments are tokenized?

(Note: another workaround to import into Ben's table is using the octal
digits escaping with backslashes:)

.import my_file \042Ben's\040table\042

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite shell dot commands

2012-02-24 Thread Benoit Mortgat
On Fri, Feb 24, 2012 at 16:22, Simon Slavin slav...@bigfraud.org wrote:

 It is faster, simpler, and would introduce far fewer ambiguities and
 opportunities for bugs, simply to remove the ability to create tables
 with whacky names.  There are no real restrictions on table names in
 the SQL specs.  You can even theoretically create a table name with a
 '' or a ']' character in, if you can make the parser accepting one.
  So every implementation of SQL has its own peculiarities.

Sorry but the question is not about what an acceptable table name is. It
is about how dot-commands in the SQLite shell could accept arguments
containing special characters (I recall that those arguments can be
table names, file names or various others: boolean, fixed hardcoded
values: `.help` in the SQLite shell lists them, as well as reading
shell.c.)

Please do not convert my thread to a debate about what characters table
names should include.

-- 
Benoit Mortgat
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite shell dot commands

2012-02-23 Thread Benoit Mortgat
I have a database with a table name containing a quote and a space.
Let's say it's called “Ben's table”

I have created it with:

 CREATE TABLE Ben's table ([column_spec]);

I tried the following in the SQLite shell:

.import 'file_name.txt' Ben's table

But that does not work. So I had to dig a bit into shell.c to
understand, and I found the following:

 * inside the do_meta_command function, the arguments to dot-commands
   are tokenized with the following rules:

   * The tokenizer skips all whitespace character when finding the
 start for the next token.

   * When ' or  is found where a token should start, the token will be
 the portion of text between that delimiter and its next occurrence.
 There is no possibility to escape the delimiter.
 The two delimiters are discarded.

   * If any other character is found where a token should start, the
 token ends at whitespace.

   * Backslashes get special processing except for tokens that were
 delimited by the single quote.

 * inside the same do_meta_command method, when processing the import
   command, the name of the table is appended to some queries this way:

   sqlite3_snprintf(nByte+20, zSql, INSERT INTO %s VALUES(?, zTable);

So, my .import command was tokenized that way:

Token 1: file_name.txt
Token 2: Ben's table  --- note: double quotes gone.

And the SQL query that was forged was incorrect:

INSERT INTO Ben's table VALUES(?...

The only way I found in order to have my import succeed was:

.import 'file_name.txt' [Ben's table]

which I find cumbersome.

Are there any plans to improve the tokenizer to enable escaping the
delimiter (doubling the single quote or backslash-escaping the double
quote), and to escape the table name in the dot-commands?

Potential problem: how to use .backup or .restore with a file name that
contains both '  and space?

Thanks.

--
Benoit Mortgat
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] double_quote(text_field)

2012-02-23 Thread Benoit Mortgat
The same, escaping double quotes inside the double quotes:

SELECT '' || replace(replace(text_field,'\','\\'),'','\') || ''

On Thu, Feb 23, 2012 at 15:16, Bill McCormick wpmccorm...@gmail.com wrote:
 Yes, that should work.

 Thanks!!
 Dave Watkinson wrote, On 2/22/2012 5:33 PM:

 Did you mean something like

 SELECT ''||text_field||'' FROM table;

 so that your column's data is quoted?

-- 
Benoit Mortgat
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.7.5 : same query written using NOT IN and EXCEPT does not give out same results

2011-02-23 Thread Benoit Mortgat
On Tue, Feb 22, 2011 at 14:17, Richard Hipp d...@sqlite.org wrote:

 The query is really more like this:

 SELECT DISTINCT COALESCE(a.xxx, b.value) value
  FROM tbl1 a
  LEFT OUTER JOIN tbl2 b
    ON a.zzz = b.ttt
  WHERE value NOT IN (
        SELECT DISTINCT ggg
          FROM tbl3
       );

 The value on the left-hand side of the NOT IN operator is ambiguous:  Does
 it refer to the first column of output or to the value column of the b
 table?  SQLite chooses the latter.

Thank you for your answer.
Benoit
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite 3.7.5 : same query written using NOT IN and EXCEPT does not give out same results

2011-02-22 Thread Benoit Mortgat
Hello,

I have come across a strange behaviour of SQLite 3.7.5.

The following query:

SELECT DISTINCT COALESCE(a.xxx, b.yyy) value
  FROM tbl1 a
  LEFT OUTER JOIN tbl2 b
ON a.zzz = b.ttt
EXCEPT
SELECT DISTINCT ggg value
  FROM tbl3;

will not return any results (which seems to be correct).

However, when I write it this way:

SELECT DISTINCT COALESCE(a.xxx, b.yyy) value
  FROM tbl1 a
  LEFT OUTER JOIN tbl2 b
ON a.zzz = b.ttt
 WHERE value NOT IN (
SELECT DISTINCT ggg
  FROM tbl3
   );

I get results (which are wrong).

I could send a samble database with full query to a developer if
needed in order to reproduce that.
Is this known bug?

Thank you,
Benoit
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DB file locked when no other process using it

2010-09-15 Thread Benoit Mortgat
Do you have SQLite Manager extension for Firefox installed? I had problems
with it having the database open until Firefox is restarted.

On Tue, Sep 14, 2010 at 15:45, Andrew Wood ajw...@theiet.org wrote:

 Im getting an error saying the database is locked even though no other
 process is accessing the file. I think it was probably caused by a
 process crashing part way through. Is there a way to force release the
 lock?

 Andrew

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] what could be the reason that natural join stops working ?

2010-09-15 Thread Benoit Mortgat
On Tue, Sep 14, 2010 at 23:41, Stef Mientki stef.mien...@gmail.com wrote:


 until a few moments ago, this worked perfectly

 select Header from vraag
  natural join vraaglist
  where Nr = 0 and vraaglist.Name = 'eortc_br23'

 but now it returns an empty string (while the string shouldn't be empty).

 I test that by doing an explicit join

 select Header from vraag
  join vraaglist on vraag.vlid = vraaglist.vlid
  where Nr = 0 and vraaglist.Name = 'eortc_br23'


Are you sure that after altering your tables adding columns, natural join
still only joins on vlid?

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Upon table creation, multiple CHECK constraints do not work with FTS3

2010-08-30 Thread Benoit Mortgat
The 1 expression was here only for simplification, but it also fails to run
the statement whatever expressions are specified.

Thank you for your answer. I now am aware that checks must be done by the
upper-level application.

Benoit

On Mon, Aug 30, 2010 at 08:19, Dan Kennedy danielk1...@gmail.com wrote:


 On Aug 30, 2010, at 12:22 PM, Benoit Mortgat wrote:

  Hello,
 
  The following table creation fails under latest release :
 
  CREATE VIRTUAL TABLE foo USING FTS3
  (
bar TEXT, othercolumns TEXT,
CHECK(1), CHECK(1)
  );
 
  Error message: vtable constructor failed

 Maybe it thinks the two CHECK(1) identifiers represent a
 duplicate column name. CHECK constraints will not work with
 either fts3 or rtree virtual tables. If they are parsed at
 all, they will probably not do what you are expecting.

 Dan.

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Upon table creation, multiple CHECK constraints do not work with FTS3

2010-08-29 Thread Benoit Mortgat
Hello,

The following table creation fails under latest release :

CREATE VIRTUAL TABLE foo USING FTS3
(
   bar TEXT, othercolumns TEXT,
   CHECK(1), CHECK(1)
);

Error message: vtable constructor failed

It is possible to bypass that error by concatenating the CHECKS with AND
logic, which is what SQLite normally does according to
http://osdir.com/ml/sqlite-users/2009-04/msg00279.html

using the R*Tree module this syntax works, so the bug should be in the FTS3
module.

Thank you,
Benoit

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to update many rows efficiently?

2010-08-05 Thread Benoit Mortgat
On Thu, Aug 5, 2010 at 01:17, Igor Tandetnik itandet...@mvps.org wrote:

 Or, if t1.ID is a primary key or otherwise has a unique constraint:

 insert or replace into t1(ID, name)
 select ID, name from t2;

this one is different because it would cause INSERTs into t1 if some
ID exists in t2 and not t1.


-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to update many rows efficiently?

2010-08-05 Thread Benoit Mortgat
2010/8/5 Dominique Pellé dominique.pe...@gmail.com:
 Using information in previous reply, I can do it with 2 UPDATE queries
 as follows (but I suspect that there is a better solution).

  UPDATE t1 SET l_nm = (SELECT l_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK)
  WHERE ID_PK IN (SELECT ID_FK FROM t2);

  UPDATE t1 SET r_nm = (SELECT r_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK)
  WHERE ID_PK IN (SELECT ID_FK FROM t2);

For this kind of statements you can use either:

UPDATE t1 SET l_nm = (SELECT l_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK),
 r_nm = (SELECT r_nm FROM t2 WHERE t1.ID_PK = t2.ID_FK)
WHERE ID_PK IN (SELECT ID_FK FROM t2);

or

INSERT OR REPLACE INTO t1(ID_PK, l_nm, r_nm)
SELECT t1.ID_PK, t2.l_nm, t2.r_nm
FROM t1 INNER JOIN t2 ON ( t2.ID_FK = t1.ID_PK )

Including t1 in the select statement above is necessary in order not
to add rows, but only keep existing ones, and id_pk must be declared
as primary key.

Regards,

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] count distinct group by

2010-07-30 Thread Benoit Mortgat
On Thu, Jul 29, 2010 at 00:47, Igor Tandetnik itandet...@mvps.org wrote:
 Peng Yu pengyu...@gmail.com wrote:
 I want to select only the rows where if S column are the same, R1
 column is the same and R2 column is the same.

 select * from mytable where s in
 (select s from mytable
  group by s
  having min(r1)=max(r1) and min(r2)=max(r2)
 );

Another possibility:
having count(distinct r1) = 1 and count(distinct r2) = 1

-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query critique

2010-07-09 Thread Benoit Mortgat
On Fri, Jul 9, 2010 at 11:08, Ian Hardingham i...@omroth.com wrote:
 Hey guys.

 I have a query which is very slow, and was wondering if there was any
 advice you guys had on it.

 Here are two table definitions:

 CREATE TABLE IF NOT EXISTS globalRankingTable (id INTEGER PRIMARY KEY
 AUTOINCREMENT, name TEXT NOT NULL UNIQUE, ranking TEXT, score REAL,
 record TEXT);

 CREATE TABLE IF NOT EXISTS friendTable (id INTEGER PRIMARY KEY
 AUTOINCREMENT, simpleId TEXT NOT NULL UNIQUE, player TEXT, friend TEXT);

 And here is my query (written in a script language):

 db.query(SELECT * FROM (SELECT DISTINCT * FROM globalRankingTable WHERE
 upper(name) = upper('?') OR id  ? union all SELECT a.* FROM
 globalRankingTable a, friendTable b WHERE upper(b.player) = upper('?')
 AND upper(b.friend) = upper(a.name)) ORDER BY score DESC, 0,
 %client.username, %globId, %client.username);


Create an index either on player column or friend column in your second table.
CREATE INDEX i_friendTable ON friendTable (player ASC);


-- 
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite lets selecting a column that is not in the Group by clause

2010-06-11 Thread Benoit Mortgat
Hello

I am running the last version on SQLite (precompiled 3.6.23.1 binary
release for Windows as found on
http://www.sqlite.org/sqlite-3_6_23_1.zip) command line. Here is the
input:

create temporary table foo(col1 text, col2 text);
insert into foo values('a','b');
insert into foo values('a','c');
insert into foo values('b','d');
insert into foo values('b','e');
.mode
select col1, col2 from foo group by col1;
   a|c
   b|e

As you can see, that last query does not result in any error, however
col2 should not be selectable if not in the group by clause?

Is this a feature, a known bug or an unknown one? If this is a
feature, is there a pragma to forbid such a syntax?

Have a nice day

--
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users