[sqlite] Multi-table index ersatz?

2015-03-03 Thread Dan Kennedy
On 03/03/2015 06:10 PM, Eric Grange wrote:
> Hi,
>
> I have problem where I need a "multi-table index" ersatz, or maybe a better
> data structure :-)
>
> The problem is as follow:
>
> - Table A : some fields plus fields A1 & A2
> - Table B : some fields plus fields B1 & B2
>
> Both tables have several dozen millions of rows, and both are accessed
> independently of each others by some queries, their current structure has
> no performance issues for those queries.
>
> However I have a new query which is like
>
> select ...some fields of A & B...
> from A join B on A.A2 = B.B2
> where A.A1 = ?1
> order by B.B1
> limit 100
>
>
> Without the limit, there can be tens of thousandths resulting rows, without
> the A1 condition, there can be millions of resulting rows.
>
> With indexes on A & B, the performance of the above is not very good, as
> indexing A1 is not enough, and indexing B1 is not enough either, so no
> query plan is satisfying.
>
> I can make the query instantaneous by duplicating the A1 & B1 fields in a
> dedicated C table (along with the primary keys of A & B), index that table,
> and then join back the A & B table to get the other fields.
>
> However this results in a fairly large table of duplicated data, whose sole
> purpose is to allow the creation of a fairly large index, which gets me the
> performance.

You might be able to make the new table a WITHOUT ROWID table and set 
its PRIMARY KEY up with the same (or a superset of the) fields of your 
"fairly large index" in order to save a bit of space.




>
> Note that if the fields A1 & B1 are removed from their tables and kept only
> in C, this has massive performance implication on other queries running
> only against A & B, as those fields are leveraged in other composite
> indexes.
>
> Is there a better way that would not involve duplicating the data?
>
> Eric
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] static malloc_zone_t* _sqliteZone_

2015-03-03 Thread Andy Rahn
Hi SQLite users;

I have a question about _sqliteZone_ in mem1.c.  I notice that the
address of this static variable is used in a call to
OSAtomicCompareAndSwapPtrBarrier on MacOS and iOS.  That system call
is declared in OSAtomic.h, which includes a note about the pointer
alignment of its arguments:

> * WARNING: all addresses passed to these functions must be "naturally 
> aligned",
> * i.e.  * int32_t pointers must be 32-bit aligned (low 2 bits of
> * address are zeroes), and int64_t pointers must be 64-bit 
> aligned
> * (low 3 bits of address are zeroes.)

I wonder, therefore, if it might be prudent to declare _sqliteZone_
with the alignment attribute, so that the compiler is sure to put it
at a 32 / 64 bit aligned address space? e .g.

static __attribute__((aligned(8))) malloc_zone_t* _sqliteZone_;

and also, because this local variable is used in that same function:

__attribute__((aligned(8))) malloc_zone_t* newzone =
malloc_create_zone(4096, 0);

I see that attribute is used one other place, so this may be an
important nuance.  On a 32-bit architecture, it would be safe to use
aligned(4) instead of aligned(8) but I'm not sure anyone will care
about the (possible) 4-byte savings.

 - Andy


[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread R.Smith


On 2015-03-03 02:43 PM, Richard Hipp wrote:
> On 3/3/15, Jan Asselman  wrote:
>> Most of my queries are in the form
>> "SELECT * FROM test WHERE a == ? AND b < ?;"
>> and use the primary key index so that the rows are returned in the expected
>> order without using the ORDER BY statement.
> Do not rely on this behavior!  It might change at any moment!
>
> If you omit the ORDER BY clause, the database engine is free to return
> rows in any order it chooses.  SQLite sometimes uses this freedom to
> choose non-intuitive query plans that run faster.  It might use this
> freedom even more in the future, thus breaking your application if you
> omit the ORDER BY clause.
>

That is of course very important, but also something else - Please do 
not try to "Help" the query planner with all manner of omissions or 
query semantics or relying on PK orders etc, in stead, ask exactly what 
you want of it, and specify the order.  Mostly, the query planner will 
find the fastest possible way, and if you ask it a good question and it 
takes long to compute, please notify us on here because chances are we 
all would like that performance issue fixed.




[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Hick Gunter
Maybe an implicit ORDER BY random() ;)

-Urspr?ngliche Nachricht-
Von: Mohit Sindhwani [mailto:ml3p at onghu.com]
Gesendet: Dienstag, 03. M?rz 2015 18:22
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] full table scan ignores PK sort order?

On 3/3/2015 6:59 PM, Jean-Christophe Deschamps wrote:
> At 11:27 03/03/2015, you wrote:
> 
>> - the full table scan returns rows in rowID order, which is the order
>> in which the rows were added to the table
> `---
>
> No and no.
>
> An SQL engine doesn't guarantee any row "order" unless you explicitely
> force an ORDER BY clause. Think of row order as random, where rowid
> order is just a possibility among zillions others. Of course neither
> SQLite nor other engines willingly use random() to foil your
> expectations but you should never rely on such an implementation detail.
>
> Also rowids are technically independant of insertion order: you may
> feed any valid random literal rowids at insert time.
>

If it wasn't a performance issue, I wish that SQLite would sometimes actually 
return values that are not in the order of insertion or by rowid just so that 
people would learn this lesson earlier :)

For sure, it took me a while... actually, till I read a book about SQlite.

Cheers,
Mohit.


___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Regarding ALTER TABLE doc page

2015-03-03 Thread Paul
In the footer of  it is said:

It is important that both of the above procedures be run from within a 
transaction to prevent other processes from accessing the database file while 
the schema change is only partially complete. 


But this statement is not true for the first procedure, where it is
suggested to disable and later re-enable foreign keys.

It is not possible to change foreign_keys once within a transaction.
So the suggestion in the footer is, at very least, misleading and can
potantially cause some very serious troubles. For example, if user
is not aware of the fact that 'PRAGMA foreign_keys = 0;' inside
transaction is a no-op.


[sqlite] No diagnostic given for missing column (since 3.7.10 up to 3.8.8.3)

2015-03-03 Thread Dan Kennedy
On 03/03/2015 03:57 PM, Andy Gibbs wrote:
> Hi,
>
> The following is a reduced test-case of a problem I have observed:
>
> CREATE TABLE a(tid,pid);
> CREATE TABLE b(tid,pname);
> CREATE TEMP TABLE pidmap(newpid INTEGER,oldpid INTEGER);
> CREATE TEMP TABLE pnamemap(pid INTEGER,pname TEXT COLLATE BINARY);
>
> (Please note that the tables are usually populated with data.)
>
> SELECT a.ROWID,b.ROWID FROM a
>  INNER JOIN b ON a.tid=b.tid
>  AND a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>
> The same problem occurs with a similar query:
>
> SELECT a.ROWID,b.ROWID FROM a
>  INNER JOIN b ON a.tid=b.tid
>  WHERE a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>
> The problem comes from an unintentional mistake: using the table
> pidmap in the sub-query, rather than pnamemap.
>
> Running the sub-query on its own gives (as expected):
>
> SELECT pid FROM pidmap;
> Error: no such column: pid
>
> This also results in an error:
>
> SELECT a.ROWID,b.ROWID FROM a
>  INNER JOIN b ON a.tid=b.tid
>  AND a.pid=(SELECT xyz FROM pidmap WHERE pname=b.pname);
> Error: no such column: xyz
>
> And, of course, correcting the original query works as expected:
>
> SELECT a.ROWID,b.ROWID FROM a
>  INNER JOIN b ON a.tid=b.tid
>  AND a.pid=(SELECT pid FROM pnamemap WHERE pname=b.pname);
>
> Seems to me that sqlite is resolving the column name from the incorrect
> scope in the sub-query?
>
> I have observed this in v3.7.10, but have also tested it in v3.8.8.3.

I think it's correct, no? Since there is no column "pid" to match 
against in the sub-query, the "pid" within the sub-query refers to 
"a.pid" from the outer query.

   http://en.wikipedia.org/wiki/Correlated_subquery




[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Hick Gunter
YES, you can.

Create analyze data:
- load your database
- analyze
- export the sqlite_statn tables

Build a new database:
- create new database
- analyze
- import the saved sqlite_statn tables



6.2 Manual Control Of Query Plans Using SQLITE_STAT Tables

SQLite provides the ability for advanced programmers to exercise control over 
the query plan chosen by the optimizer. One method for doing this is to fudge 
the ANALYZE results in the sqlite_stat1, sqlite_stat3, and/or sqlite_stat4 
tables. That approach is not recommended except for the one scenario described 
in the next paragraph.

For a program that uses an SQLite database as its application file-format, when 
a new database instance is first created the ANALYZE command is ineffective 
because the database contain no data from which to gather statistics. In that 
case, one could construct a large prototype database containing typical data 
during development and run the ANALYZE command on this prototype database to 
gather statistics, then save the prototype statistics as part of the 
application. After deployment, when the application goes to create a new 
database file, it can run the ANALYZE command in order to create the statistics 
tables, then copy the precomputed statistics obtained from the prototype 
database into these new statistics tables. In that way, statistics from large 
working data sets can be preloaded into newly created application files.

-Urspr?ngliche Nachricht-
Von: Jan Asselman [mailto:jan.asselman at iba-benelux.com]
Gesendet: Dienstag, 03. M?rz 2015 15:27
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] full table scan ignores PK sort order?

Thanks for answering both of my questions.

I guess this is similar to the 'Skip-Scan Optimization' mentioned in the 
documentation. That is what I assumed and explains the difference in query 
performance. Scanning the table once is faster than scanning the table for each 
and every value of column 'a' (and the table is too large for any sort of cache 
to be useful)...

I can try to see if the ANALYZE statement makes a difference. But it would only 
be useful if I can copy the 'impact' of this statement to other database files 
with equal table definitions. Because in my application, I am constantly 
creating new database file, filling them, and eventually deleting them.

Can I copy the "statistics tables" from one database file to another?

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: dinsdag 3 maart 2015 13:08
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] full table scan ignores PK sort order?

The subquery is the index access (partial table scan), which is performed once 
for each and every value in your IN list for the column a (in effect, the IN 
list is transformed into an ephemeral table and joined to your test table).

Since you did not declare an index for your primary key constraint, SQLite has 
to invent one.

If you insert a representative data set and run ANALYZE then the query plan may 
well change to something that suits the shape of your data better.

-Urspr?ngliche Nachricht-
Von: Jan Asselman [mailto:jan.asselman at iba-benelux.com]
Gesendet: Dienstag, 03. M?rz 2015 11:27
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] full table scan ignores PK sort order?

Thank you! I think I get it:

- primary key is nothing but a UNIQUE constraint (in my case comparable to a 
separate index == table with columns a, b and rowID)
- the full table scan returns rows in rowID order, which is the order in which 
the rows were added to the table

There is just one more thing I would like to understand:

Most of my queries are in the form
"SELECT * FROM test WHERE a == ? AND b < ?;"
and use the primary key index so that the rows are returned in the expected 
order without using the ORDER BY statement.

It's only a special case where the query "SELECT * FROM test WHERE b < ? AND c 
> ?;"
is used. That's why I decided not to add a separate index to column b.

What I was trying, in order to prevent a full table scan, is force the use of 
the primary key index for this query and have sqlite allow all possible values 
for column a:
"SELECT * FROM test WHERE a IN (<<256 different values>>) AND b < ? AND c > ?;"

The query plan confirms the use of the primary key index:

0, 0, 0, SEARCH TABLE data USING INDEX sqlite_autoindex_test_1 (a=? AND bhttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally 

[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Jan Asselman
Thanks for answering both of my questions.

I guess this is similar to the 'Skip-Scan Optimization' mentioned in the 
documentation. That is what I assumed and explains the difference in query 
performance. Scanning the table once is faster than scanning the table for each 
and every value of column 'a' (and the table is too large for any sort of cache 
to be useful)...

I can try to see if the ANALYZE statement makes a difference. But it would only 
be useful if I can copy the 'impact' of this statement to other database files 
with equal table definitions. Because in my application, I am constantly 
creating new database file, filling them, and eventually deleting them.

Can I copy the "statistics tables" from one database file to another?

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: dinsdag 3 maart 2015 13:08
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] full table scan ignores PK sort order?

The subquery is the index access (partial table scan), which is performed once 
for each and every value in your IN list for the column a (in effect, the IN 
list is transformed into an ephemeral table and joined to your test table).

Since you did not declare an index for your primary key constraint, SQLite has 
to invent one.

If you insert a representative data set and run ANALYZE then the query plan may 
well change to something that suits the shape of your data better.

-Urspr?ngliche Nachricht-
Von: Jan Asselman [mailto:jan.asselman at iba-benelux.com]
Gesendet: Dienstag, 03. M?rz 2015 11:27
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] full table scan ignores PK sort order?

Thank you! I think I get it:

- primary key is nothing but a UNIQUE constraint (in my case comparable to a 
separate index == table with columns a, b and rowID)
- the full table scan returns rows in rowID order, which is the order in which 
the rows were added to the table

There is just one more thing I would like to understand:

Most of my queries are in the form
"SELECT * FROM test WHERE a == ? AND b < ?;"
and use the primary key index so that the rows are returned in the expected 
order without using the ORDER BY statement.

It's only a special case where the query "SELECT * FROM test WHERE b < ? AND c 
> ?;"
is used. That's why I decided not to add a separate index to column b.

What I was trying, in order to prevent a full table scan, is force the use of 
the primary key index for this query and have sqlite allow all possible values 
for column a:
"SELECT * FROM test WHERE a IN (<<256 different values>>) AND b < ? AND c > ?;"

The query plan confirms the use of the primary key index:

0, 0, 0, SEARCH TABLE data USING INDEX sqlite_autoindex_test_1 (a=? AND b

[sqlite] No diagnostic given for missing column (since 3.7.10 up to 3.8.8.3)

2015-03-03 Thread Andy Gibbs
On Tuesday, March 03, 2015 10:50 AM, Dan Kennedy wrote:
> On 03/03/2015 03:57 PM, Andy Gibbs wrote:
>> Hi,
>>
>> The following is a reduced test-case of a problem I have observed:
>>
>> CREATE TABLE a(tid,pid);
>> CREATE TABLE b(tid,pname);
>> CREATE TEMP TABLE pidmap(newpid INTEGER,oldpid INTEGER);
>> CREATE TEMP TABLE pnamemap(pid INTEGER,pname TEXT COLLATE BINARY);
>>
>> (Please note that the tables are usually populated with data.)
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>>
>> The same problem occurs with a similar query:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  WHERE a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>>
>> The problem comes from an unintentional mistake: using the table
>> pidmap in the sub-query, rather than pnamemap.
>>
>> Running the sub-query on its own gives (as expected):
>>
>> SELECT pid FROM pidmap;
>> Error: no such column: pid
>>
>> This also results in an error:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT xyz FROM pidmap WHERE pname=b.pname);
>> Error: no such column: xyz
>>
>> And, of course, correcting the original query works as expected:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT pid FROM pnamemap WHERE pname=b.pname);
>>
>> Seems to me that sqlite is resolving the column name from the incorrect
>> scope in the sub-query?
>>
>> I have observed this in v3.7.10, but have also tested it in v3.8.8.3.
> 
> I think it's correct, no? Since there is no column "pid" to match 
> against in the sub-query, the "pid" within the sub-query refers to 
> "a.pid" from the outer query.
> 
>   http://en.wikipedia.org/wiki/Correlated_subquery

I am certainly no SQL language guru, but I should have thought that the
logic should work this way... (taking just the sub-query here)

SELECT pid FROM pidmap WHERE pname=b.pname

  -- the referenced table in "FROM" is unambiguously pidmap since there
 is no further join, etc. in this sub-query
  -- the column list should (IMHO) apply to this table alone, i.e. "pid"
 here, unless fully qualified, causing an error if the column doesn't
 exist in pidmap
  -- in the "WHERE" clause, a more relaxed lookup can be performed and
 include references to the outer query.

In the same way that SQL allows the disambiguation "b.pname" in the
"WHERE" clause, if the user wished to do something like this...

SELECT a.pid FROM pidmap WHERE pname=b.pname

... then this should work - and does even though I can't think of a use-
case for this!

Of course, by extension, this statement gives the expected error:

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  AND a.pid=(SELECT c.pid FROM pidmap AS c
   WHERE c.pname=b.pname);

but to me this is counter-intuitive and needlessly verbose -- it is
"clear" from the original query that "SELECT pid FROM pidmap" expects
pid to be found in pidmap.  (Clear is in quotes since it may not be
clear to the database engine although it may be clear to user!)

Even if it is legal SQL (again I'm not an expert!) then I think it is
at best ambiguous, and perhaps there could be an advantage here for a
shell or pragma option that can highlight likely typos such as that
detailed here, i.e. to provide a warning message if not an error.

Andy



[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Jan Asselman
Thanks for the detailed explanation.

I understand that I should not make assumptions about SQL engine internals for 
typical database usage.

I now also understand that by not using the ORDER BY clause, SQLite might one 
day decide to execute query plans that disrupt the order that I induce from the 
index that is traversed.

But, basically, I use SQLite as a file format that allows me to access (ranges 
of) persistent BLOBs in O(log N) disk time. I require the B-tree functionality, 
and very little of the SQL functionality. 

The reason why I am reluctant to use the ORDER BY clause is that I absolutely 
want to exclude the possibility that all rows must be read into memory before 
they can be sorted and iterated.

But I will add the ORDER BY clause, as it doesn't make a performance impact for 
me in the current version, and hope the query planner in future versions 
behaves the same :)


-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
Sent: dinsdag 3 maart 2015 12:20
To: General Discussion of SQLite Database
Subject: Re: [sqlite] full table scan ignores PK sort order?


On 3 Mar 2015, at 10:59am, Jean-Christophe Deschamps  
wrote:

> An SQL engine doesn't guarantee any row "order" unless you explicitely force 
> an ORDER BY clause. Think of row order as random, where rowid order is just a 
> possibility among zillions others. Of course neither SQLite nor other engines 
> willingly use random() to foil your expectations but you should never rely on 
> such an implementation detail.
> 
> Also rowids are technically independant of insertion order: you may feed any 
> valid random literal rowids at insert time.

Just to formalise this ... SQL defines a table as a set of rows.  There is no 
order to a set: it's just a jumble of things like Scrabble tiles in a bag.  You 
can't tell what order rows were added in.

The most frequent error SQL users make is to assume that table rows are 
inherently ordered in primary key order.  They're not.  The primary key is just 
another unique index.

To help users avoid incorrect assumptions about an inherent 'order of rows' 
SQLite provides

PRAGMA reverse_unordered_selects = ON

which can be useful for testing code which was hacked up in a hurry.

Simon.
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] No diagnostic given for missing column (since 3.7.10 up to 3.8.8.3)

2015-03-03 Thread Hick Gunter
Your expectation is wrong. What follows SELECT is the "result column list", 
which may contain arbitrary expressions. SQL will attempt to identify all the 
objects mentioned in the result column list by searching the defined objects of 
the statement (which includes the definitions of all the data sources give in 
the FROM/JOIN  clauses) and only complain if a name matches zero (=undefined) 
or more than one (=ambiguous) objects. A subselect may reference (indeed, it is 
often pointless without doing so) the "current value" of an outer select, even 
if that means comparing a field to itself.


-Urspr?ngliche Nachricht-
Von: Andy Gibbs [mailto:andyg1001 at hotmail.co.uk]
Gesendet: Dienstag, 03. M?rz 2015 14:15
An: sqlite-users at mailinglists.sqlite.org
Betreff: Re: [sqlite] No diagnostic given for missing column (since 3.7.10 up 
to 3.8.8.3)

On Tuesday, March 03, 2015 10:50 AM, Dan Kennedy wrote:
> On 03/03/2015 03:57 PM, Andy Gibbs wrote:
>> Hi,
>>
>> The following is a reduced test-case of a problem I have observed:
>>
>> CREATE TABLE a(tid,pid);
>> CREATE TABLE b(tid,pname);
>> CREATE TEMP TABLE pidmap(newpid INTEGER,oldpid INTEGER); CREATE TEMP
>> TABLE pnamemap(pid INTEGER,pname TEXT COLLATE BINARY);
>>
>> (Please note that the tables are usually populated with data.)
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>>
>> The same problem occurs with a similar query:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  WHERE a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);
>>
>> The problem comes from an unintentional mistake: using the table
>> pidmap in the sub-query, rather than pnamemap.
>>
>> Running the sub-query on its own gives (as expected):
>>
>> SELECT pid FROM pidmap;
>> Error: no such column: pid
>>
>> This also results in an error:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT xyz FROM pidmap WHERE pname=b.pname);
>> Error: no such column: xyz
>>
>> And, of course, correcting the original query works as expected:
>>
>> SELECT a.ROWID,b.ROWID FROM a
>>  INNER JOIN b ON a.tid=b.tid
>>  AND a.pid=(SELECT pid FROM pnamemap WHERE
>> pname=b.pname);
>>
>> Seems to me that sqlite is resolving the column name from the
>> incorrect scope in the sub-query?
>>
>> I have observed this in v3.7.10, but have also tested it in v3.8.8.3.
>
> I think it's correct, no? Since there is no column "pid" to match
> against in the sub-query, the "pid" within the sub-query refers to
> "a.pid" from the outer query.
>
>   http://en.wikipedia.org/wiki/Correlated_subquery

I am certainly no SQL language guru, but I should have thought that the logic 
should work this way... (taking just the sub-query here)

SELECT pid FROM pidmap WHERE pname=b.pname

  -- the referenced table in "FROM" is unambiguously pidmap since there
 is no further join, etc. in this sub-query
  -- the column list should (IMHO) apply to this table alone, i.e. "pid"
 here, unless fully qualified, causing an error if the column doesn't
 exist in pidmap
  -- in the "WHERE" clause, a more relaxed lookup can be performed and
 include references to the outer query.

In the same way that SQL allows the disambiguation "b.pname" in the "WHERE" 
clause, if the user wished to do something like this...

SELECT a.pid FROM pidmap WHERE pname=b.pname

... then this should work - and does even though I can't think of a use- case 
for this!

Of course, by extension, this statement gives the expected error:

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  AND a.pid=(SELECT c.pid FROM pidmap AS c
   WHERE c.pname=b.pname);

but to me this is counter-intuitive and needlessly verbose -- it is "clear" 
from the original query that "SELECT pid FROM pidmap" expects pid to be found 
in pidmap.  (Clear is in quotes since it may not be clear to the database 
engine although it may be clear to user!)

Even if it is legal SQL (again I'm not an expert!) then I think it is at best 
ambiguous, and perhaps there could be an advantage here for a shell or pragma 
option that can highlight likely typos such as that detailed here, i.e. to 
provide a warning message if not an error.

Andy

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 

[sqlite] doc puzzle, on-conflict clause in insert

2015-03-03 Thread Larry Brasfield
I was looking at http://www.sqlite.org/lang_insert.html for details on the
'INSERT' statement. In the text, it says: "The optional conflict-clause
allows the specification of an alternative constraint conflict resolution
algorithm to use during this one INSERT command. See the section titled ON
CONFLICT for additional information."

I puzzled over that text for awhile because there is no obvious conflict
option shown in the railroad syntax chart. Although I was able to sort this
out, after going to the linked page regarding "ON CONFLICT", I would like
to propose a much clearer way to document that option for the INSERT
statement.

The railroad chart would have the whole set of INSERT paths replaced with
this:
INSERT -
\-- OR --\ /
 |\-- REPLACE /|
 |\-- ROLLBACK ---/|
 |\-- ABORT --/|
 |\-- FAIL ---/|
  \-- IGNORE -/

(Best viewed with a monospaced font.)

The lead sentence referring to the confict resolution options would read:
"The optional, 'OR'-prefixed conflict-clause allows the specification of an
alternative constraint conflict resolution algorithm to use during this one
INSERT command."

Lest this be perceived as nit-picking, I justify it thusly: The SQLite
syntax documentation has been so clear and useful to me that my puzzling
experience with this issue really stood out. (For awhile, I thought it was
a plain doc bug, until I  went to a separate page to relearn how the 'OR'
syntax variant works.) The other is that the railroad diagram structure
does not clearly indicate the optionality except by recognizing the
duplication of the 'INSERT' keyword. (I'll admit to stupidity for not
seeing it sooner.)


-- 
Larry Brasfield


[sqlite] PRAGMA Synchronous safety

2015-03-03 Thread Simon Slavin

On 2 Mar 2015, at 10:32pm, Doug Nebeker  wrote:

>> Are you using any PRAGMAs apart from "PRAGMA synchronous" ?
> 
> PRAGMA temp_store=1 (file)
> PRAGMA cache_size=2000
> PRAGMA page_size=4096
> sqlite3_soft_heap_limit( 1GB )
> 
> Those are the only non-default settings.

I see nothing in there (or the rest of your posts) that looks dangerous.  So I 
still don't know why you're seeing corruption.  I hope one of the other posts 
here will help you.

>> Are you testing the result codes of /all/ your sqlite3_ calls
> 
> Yes.  But I usually don't see the full log to see if something has happened 
> earlier.  I'll have to track down a log and see if the corruption error is 
> the first one that happens.

Might be helpful.  Also to perhaps make it clear to your users that something 
is wrong the first time you get a result that isn't SQLITE_OK.

Simon.


[sqlite] Multi-table index ersatz?

2015-03-03 Thread Hick Gunter
Let's construct this example from the reverse.

The object is to avoid the sort at the end (sorting "millions" to return 100 is 
a bad tradeoff), so the B table needs to be visited in B1 order.

-> outer loop = B
-> inner loop = A
-> index B on (B1,...)

The join is on A2 = B2

->index A on (A2,...)

The WHERE clause specifies to quickly determine the subset of rows that have a 
specific value of A1

->index A on (A1,...)

To minimize the number of rows visited in A, the more specific field should 
come first

-> index A on (A1,A2,...) if card(A1) > card(A2)

Which results in

CREATE INDEX B_B1 on B (B1);
CREATE INDEX A_A1_A2 on A (A1, A2);

SELECT ... FROM B CROSS JOIN A ON (A.A1 = ? AND A.A2 = B.B2) ORDER BY B.B1 
LIMIT 100;

For reversed cardinality, only the index on A needs to switch field order.


Rowids will be faster than primary keys.

CREATE TABE C AS SELECT ( A.A1, B.B1, A.rowid AS IDA, B.rowid AS IDB FROM A 
JOIN B ON A.A2 = B.B2);
CREATE INDEX ON C (A1,B1);

SELECT  FROM C JOIN A ON A.ROWID = C.IDA JOIN B ON B.ROWID = C.IDB WHERE 
C.A1 = ? ORDER BY C.B1 LIMIT 100;


-Urspr?ngliche Nachricht-
Von: Eric Grange [mailto:zarglu at gmail.com]
Gesendet: Dienstag, 03. M?rz 2015 12:10
An: General Discussion of SQLite Database
Betreff: [sqlite] Multi-table index ersatz?

Hi,

I have problem where I need a "multi-table index" ersatz, or maybe a better 
data structure :-)

The problem is as follow:

   - Table A : some fields plus fields A1 & A2
   - Table B : some fields plus fields B1 & B2

Both tables have several dozen millions of rows, and both are accessed 
independently of each others by some queries, their current structure has no 
performance issues for those queries.

However I have a new query which is like

select ...some fields of A & B...
from A join B on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100


Without the limit, there can be tens of thousandths resulting rows, without the 
A1 condition, there can be millions of resulting rows.

With indexes on A & B, the performance of the above is not very good, as 
indexing A1 is not enough, and indexing B1 is not enough either, so no query 
plan is satisfying.

I can make the query instantaneous by duplicating the A1 & B1 fields in a 
dedicated C table (along with the primary keys of A & B), index that table, and 
then join back the A & B table to get the other fields.

However this results in a fairly large table of duplicated data, whose sole 
purpose is to allow the creation of a fairly large index, which gets me the 
performance.

Note that if the fields A1 & B1 are removed from their tables and kept only in 
C, this has massive performance implication on other queries running only 
against A & B, as those fields are leveraged in other composite indexes.

Is there a better way that would not involve duplicating the data?

Eric
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Multi-table index ersatz?

2015-03-03 Thread Clemens Ladisch
Eric Grange wrote:
> select ...some fields of A & B...
> from A join B on A.A2 = B.B2
> where A.A1 = ?1
> order by B.B1
> limit 100
>
> Without the limit, there can be tens of thousandths resulting rows,

Even with the limit, all the tens of thousands rows must be sorted.

> without the A1 condition, there can be millions of resulting rows.
>
> With indexes on A & B, the performance of the above is not very good, as
> indexing A1 is not enough, and indexing B1 is not enough either, so no
> query plan is satisfying.

According to your numbers, the index on A1 is more important, which
implies that the sorting must be done without the help of an index.

> I can make the query instantaneous by duplicating the A1 & B1 fields in a
> dedicated C table (along with the primary keys of A & B), index that table,
> and then join back the A & B table to get the other fields.
> [...]
> Is there a better way that would not involve duplicating the data?

An index _is_ somthing like a table containing duplicated data; the
difference is that it is maintained automatically.  You can get the
same effect with triggers to maintain the C table.

SQLite does not have multi-table indexes.

It might be possible to write a virtual table module that does the same
as your index on C, but with C being a view.


Regards,
Clemens


[sqlite] Multi-table index ersatz?

2015-03-03 Thread Eric Grange
Yes A2 & B2 are already indexed (individually and in composite indexes)
The problem is that this indexing is not selective enough when taken in
isolation.
Le 3 mars 2015 12:36, "Simon Davies"  a ?crit
:

> On 3 March 2015 at 11:10, Eric Grange  wrote:
> >
> > Hi,
> >
> > I have problem where I need a "multi-table index" ersatz, or maybe a
> better
> > data structure :-)
> >
> > The problem is as follow:
> >
> >- Table A : some fields plus fields A1 & A2
> >- Table B : some fields plus fields B1 & B2
> >
> > Both tables have several dozen millions of rows, and both are accessed
> > independently of each others by some queries, their current structure has
> > no performance issues for those queries.
> >
> > However I have a new query which is like
> >
> > select ...some fields of A & B...
> > from A join B on A.A2 = B.B2
> > where A.A1 = ?1
> > order by B.B1
> > limit 100
> >
> > Without the limit, there can be tens of thousandths resulting rows,
> without
> > the A1 condition, there can be millions of resulting rows.
> >
> > With indexes on A & B, the performance of the above is not very good, as
> > indexing A1 is not enough, and indexing B1 is not enough either, so no
> > query plan is satisfying.
>
> Have you tried indexing on A2?
>
> .
> .
> .
> > Is there a better way that would not involve duplicating the data?
> >
> > Eric
>
> Regards,
> Simon
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Why bind indexes start from 1 and column indexes start from 0?

2015-03-03 Thread Paolo Bolzoni
> I can't confirm that 100% off the top of my head but I'm uncoordinated
> enough to repeatedly confuse the bind and column value API calls and use
> 0-based indices for both and haven't noticed any really untoward behaviour
> (beyond my code not working and requiring fixing).

I know the feeling, I made it wrong so many times that I almost wanted
to "fix" the index in my thin C++ wrapper...


[sqlite] Multi-table index ersatz?

2015-03-03 Thread Simon Slavin

On 3 Mar 2015, at 11:10am, Eric Grange  wrote:

> With indexes on A & B, the performance of the above is not very good, as
> indexing A1 is not enough, and indexing B1 is not enough either, so no
> query plan is satisfying.

The B1 index isn't going to be used.  Here is your query:

select ...some fields of A & B...
from A join B on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100

An index on A.A1 is good, since it satisfies the "WHERE" clause.  So do that as 
you did.

Once SQLite has found relevant rows of A it's processing the "JOIN".  This 
means going through table B.B2 looking for rows which match existing values of 
A.A2.  You need an index on B.B2.  But having done that the "ORDER BY" clause 
means it needs to order the resulting rows by B.B1.  So one index which allowed 
both might provide the best advantage.  Try

CREATE INDEX B_2_1 ON B (B2, B1)

then do an ANALYZE, and see if that helps matters.  Another approach is to look 
at your "ORDER BY" clause and see that what you're doing is based on ordering 
by B.B1, sp another way to see your query is

select ...some fields of A & B...
from B join A on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100

This would be best helped with an index on B.B1 and another on A.A1 and A.A2.  
I suggest you do

CREATE INDEX B_1_2 ON B (B1, B2)
CREATE INDEX B_2_1 ON B (B2, B1)
CREATE INDEX A_1_2 ON A (A1, A2)
CREATE INDEX A_2_1 ON A (A2, A1)

then do an ANALYZE, then execute the query as rephrased above.  If this turns 
out to be faster you can use EXPLAIN QUERY PLAN to find out which indexes 
SQLite is doing and delete the unused ones.

Simon.


[sqlite] Multi-table index ersatz?

2015-03-03 Thread Eric Grange
Hi,

I have problem where I need a "multi-table index" ersatz, or maybe a better
data structure :-)

The problem is as follow:

   - Table A : some fields plus fields A1 & A2
   - Table B : some fields plus fields B1 & B2

Both tables have several dozen millions of rows, and both are accessed
independently of each others by some queries, their current structure has
no performance issues for those queries.

However I have a new query which is like

select ...some fields of A & B...
from A join B on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100


Without the limit, there can be tens of thousandths resulting rows, without
the A1 condition, there can be millions of resulting rows.

With indexes on A & B, the performance of the above is not very good, as
indexing A1 is not enough, and indexing B1 is not enough either, so no
query plan is satisfying.

I can make the query instantaneous by duplicating the A1 & B1 fields in a
dedicated C table (along with the primary keys of A & B), index that table,
and then join back the A & B table to get the other fields.

However this results in a fairly large table of duplicated data, whose sole
purpose is to allow the creation of a fairly large index, which gets me the
performance.

Note that if the fields A1 & B1 are removed from their tables and kept only
in C, this has massive performance implication on other queries running
only against A & B, as those fields are leveraged in other composite
indexes.

Is there a better way that would not involve duplicating the data?

Eric


[sqlite] With recursive question

2015-03-03 Thread Jean-Christophe Deschamps
At 04:05 01/03/2015, you wrote:

>On 2/28/2015 7:22 PM, Jean-Christophe Deschamps wrote:
>>Once again thank you very much Igor. I was making my life miserable
>>trying to scan the "tree" from the other end.
>
>That, too, could be arranged. Something along these lines:
>
>with recursive FileDirs as (
>   select FileId, FileDirID ancestor, '' as path from Files
>union all
>   select FileId, ParentID, '/' || DirName || path
>   from FileDirs join Dirs on (ancestor = DirId)
>   where DirID != 0
>)
>select FileId, FileName, path
>from FileDirs join Files using (FileId)
>where ancestor = 0;

Thanks again Igor for turning on the light!

--
jcd at antichoc.net  



[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Hick Gunter
The subquery is the index access (partial table scan), which is performed once 
for each and every value in your IN list for the column a (in effect, the IN 
list is transformed into an ephemeral table and joined to your test table).

Since you did not declare an index for your primary key constraint, SQLite has 
to invent one.

If you insert a representative data set and run ANALYZE then the query plan may 
well change to something that suits the shape of your data better.

-Urspr?ngliche Nachricht-
Von: Jan Asselman [mailto:jan.asselman at iba-benelux.com]
Gesendet: Dienstag, 03. M?rz 2015 11:27
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] full table scan ignores PK sort order?

Thank you! I think I get it:

- primary key is nothing but a UNIQUE constraint (in my case comparable to a 
separate index == table with columns a, b and rowID)
- the full table scan returns rows in rowID order, which is the order in which 
the rows were added to the table

There is just one more thing I would like to understand:

Most of my queries are in the form
"SELECT * FROM test WHERE a == ? AND b < ?;"
and use the primary key index so that the rows are returned in the expected 
order without using the ORDER BY statement.

It's only a special case where the query "SELECT * FROM test WHERE b < ? AND c 
> ?;"
is used. That's why I decided not to add a separate index to column b.

What I was trying, in order to prevent a full table scan, is force the use of 
the primary key index for this query and have sqlite allow all possible values 
for column a:
"SELECT * FROM test WHERE a IN (<<256 different values>>) AND b < ? AND c > ?;"

The query plan confirms the use of the primary key index:

0, 0, 0, SEARCH TABLE data USING INDEX sqlite_autoindex_test_1 (a=? AND bmailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Igor 
Tandetnik
Sent: maandag 2 maart 2015 22:52
To: sqlite-users at sqlite.org
Subject: Re: [sqlite] full table scan ignores PK sort order?

On 3/2/2015 4:48 AM, Jan Asselman wrote:
> But when I step over the rows they are not returned in primary key sort 
> order. Why is this?

Because you didn't add an ORDER BY clause. If you need a particular sort order, 
specify it with ORDER BY.

> If I look at the images at the query planning document 
> (https://www.sqlite.org/queryplanner.html) I get the idea that the primary 
> key B-tree should be used to traverse the table when a full table scan is 
> executed.

Not the primary key, but the ROWID column. It may optionally be aliased by a 
column declared as INTEGER PRIMARY KEY (must be spelled exactly this way); your 
table doesn't have such an alias.

This changes for tables created with WITHOUT ROWID clause; but this, too, 
doesn't apply in your case.

> And since the 'DESC' keyword is used on column b in the primary key, I would 
> expect that, as the rowId increases, the values retuned for column b would 
> decrease. But this is not the case.

How can this be the case? You can update the value of b in an existing row - do 
you expect all the rows to be physically moved and renumbered when this happens?

> Does a full table scan then ignore the PK B-tree?

What you think of as "PK B-tree" doesn't exist.

> If I explicitly order using an 'ORDER BY' statement then - looking at the 
> query plan - sqlite seems to perform a full table scan and store the result 
> in a temporary table which is then sorted. I'd like to avoid the memory 
> consumption produced by this query plan...

If you "ORDER BY a, b desc" SQLite should be using index scan without an 
explicit sort step. Does this not happen?
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Jean-Christophe Deschamps
At 11:27 03/03/2015, you wrote:

>- the full table scan returns rows in rowID order, which is the order 
>in which the rows were added to the table
`---

No and no.

An SQL engine doesn't guarantee any row "order" unless you explicitely 
force an ORDER BY clause. Think of row order as random, where rowid 
order is just a possibility among zillions others. Of course neither 
SQLite nor other engines willingly use random() to foil your 
expectations but you should never rely on such an implementation detail.

Also rowids are technically independant of insertion order: you may 
feed any valid random literal rowids at insert time.


--
jcd at antichoc.net  



[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Dominique Devienne
On Tue, Mar 3, 2015 at 11:27 AM, Jan Asselman 
wrote:

> - the full table scan returns rows in rowID order, which is the order in
> which the rows were added to the table
>

You cannot rely on that (the "the order in which rows were added" part). At
best it's an implementation detail.

If you want to guarantee rows are in rowid order, then explicitly ORDER BY
ROWID, and often this will be free indeed, but again that's an impl detail.

And there's no "meaning" that the order of rowids IMHO. --DD


[sqlite] Multi-table index ersatz?

2015-03-03 Thread Simon Davies
On 3 March 2015 at 11:10, Eric Grange  wrote:
>
> Hi,
>
> I have problem where I need a "multi-table index" ersatz, or maybe a better
> data structure :-)
>
> The problem is as follow:
>
>- Table A : some fields plus fields A1 & A2
>- Table B : some fields plus fields B1 & B2
>
> Both tables have several dozen millions of rows, and both are accessed
> independently of each others by some queries, their current structure has
> no performance issues for those queries.
>
> However I have a new query which is like
>
> select ...some fields of A & B...
> from A join B on A.A2 = B.B2
> where A.A1 = ?1
> order by B.B1
> limit 100
>
> Without the limit, there can be tens of thousandths resulting rows, without
> the A1 condition, there can be millions of resulting rows.
>
> With indexes on A & B, the performance of the above is not very good, as
> indexing A1 is not enough, and indexing B1 is not enough either, so no
> query plan is satisfying.

Have you tried indexing on A2?

. 
. 
. 
> Is there a better way that would not involve duplicating the data?
>
> Eric

Regards,
Simon


[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Simon Slavin

On 3 Mar 2015, at 10:59am, Jean-Christophe Deschamps  
wrote:

> An SQL engine doesn't guarantee any row "order" unless you explicitely force 
> an ORDER BY clause. Think of row order as random, where rowid order is just a 
> possibility among zillions others. Of course neither SQLite nor other engines 
> willingly use random() to foil your expectations but you should never rely on 
> such an implementation detail.
> 
> Also rowids are technically independant of insertion order: you may feed any 
> valid random literal rowids at insert time.

Just to formalise this ... SQL defines a table as a set of rows.  There is no 
order to a set: it's just a jumble of things like Scrabble tiles in a bag.  You 
can't tell what order rows were added in.

The most frequent error SQL users make is to assume that table rows are 
inherently ordered in primary key order.  They're not.  The primary key is just 
another unique index.

To help users avoid incorrect assumptions about an inherent 'order of rows' 
SQLite provides

PRAGMA reverse_unordered_selects = ON

which can be useful for testing code which was hacked up in a hurry.

Simon.


[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Jan Asselman
Thank you! I think I get it:

- primary key is nothing but a UNIQUE constraint (in my case comparable to a 
separate index == table with columns a, b and rowID)
- the full table scan returns rows in rowID order, which is the order in which 
the rows were added to the table

There is just one more thing I would like to understand:

Most of my queries are in the form
"SELECT * FROM test WHERE a == ? AND b < ?;"
and use the primary key index so that the rows are returned in the expected 
order without using the ORDER BY statement.

It's only a special case where the query
"SELECT * FROM test WHERE b < ? AND c > ?;"
is used. That's why I decided not to add a separate index to column b.

What I was trying, in order to prevent a full table scan, is force the use of 
the primary key index for this query and have sqlite allow all possible values 
for column a:
"SELECT * FROM test WHERE a IN (<<256 different values>>) AND b < ? AND c > ?;"

The query plan confirms the use of the primary key index:

0, 0, 0, SEARCH TABLE data USING INDEX sqlite_autoindex_test_1 (a=? AND bmailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Igor 
Tandetnik
Sent: maandag 2 maart 2015 22:52
To: sqlite-users at sqlite.org
Subject: Re: [sqlite] full table scan ignores PK sort order?

On 3/2/2015 4:48 AM, Jan Asselman wrote:
> But when I step over the rows they are not returned in primary key sort 
> order. Why is this?

Because you didn't add an ORDER BY clause. If you need a particular sort order, 
specify it with ORDER BY.

> If I look at the images at the query planning document 
> (https://www.sqlite.org/queryplanner.html) I get the idea that the primary 
> key B-tree should be used to traverse the table when a full table scan is 
> executed.

Not the primary key, but the ROWID column. It may optionally be aliased by a 
column declared as INTEGER PRIMARY KEY (must be spelled exactly this way); your 
table doesn't have such an alias.

This changes for tables created with WITHOUT ROWID clause; but this, too, 
doesn't apply in your case.

> And since the 'DESC' keyword is used on column b in the primary key, I would 
> expect that, as the rowId increases, the values retuned for column b would 
> decrease. But this is not the case.

How can this be the case? You can update the value of b in an existing row - do 
you expect all the rows to be physically moved and renumbered when this happens?

> Does a full table scan then ignore the PK B-tree?

What you think of as "PK B-tree" doesn't exist.

> If I explicitly order using an 'ORDER BY' statement then - looking at the 
> query plan - sqlite seems to perform a full table scan and store the result 
> in a temporary table which is then sorted. I'd like to avoid the memory 
> consumption produced by this query plan...

If you "ORDER BY a, b desc" SQLite should be using index scan without an 
explicit sort step. Does this not happen?
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Why bind indexes start from 1 and column indexes start from 0?

2015-03-03 Thread Scott Hess
On Tue, Mar 3, 2015 at 3:33 AM, Paolo Bolzoni
 wrote:
>> I can't confirm that 100% off the top of my head but I'm uncoordinated
>> enough to repeatedly confuse the bind and column value API calls and use
>> 0-based indices for both and haven't noticed any really untoward behaviour
>> (beyond my code not working and requiring fixing).
>
> I know the feeling, I made it wrong so many times that I almost wanted
> to "fix" the index in my thin C++ wrapper...

Instead, you could consider having your wrapper throw an assertion if
someone doesn't bind all parameters.  Asserting if someone doesn't
consume all result columns is a little more dubious, because you'd
probably need an API call to explicitly ignore a column, but I'd not
consider such an assertion insane, either.  For either case using
indices leads to kind of brittle code in any case, regardless of
whether it's 0-based or 1-based.

-scott


[sqlite] List duplication

2015-03-03 Thread R.Smith


On 2015-03-03 06:14 AM, Mike Owens wrote:
> Okay, I blocked the sqlite-users at sqlite.org address in the to address
> so if it is sent alone, it will be blocked.

This is working perfectly, thanks!




[sqlite] No diagnostic given for missing column (since 3.7.10 up to 3.8.8.3)

2015-03-03 Thread Andy Gibbs
Hi,

The following is a reduced test-case of a problem I have observed:

CREATE TABLE a(tid,pid);
CREATE TABLE b(tid,pname);
CREATE TEMP TABLE pidmap(newpid INTEGER,oldpid INTEGER);
CREATE TEMP TABLE pnamemap(pid INTEGER,pname TEXT COLLATE BINARY);

(Please note that the tables are usually populated with data.)

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  AND a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);

The same problem occurs with a similar query:

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  WHERE a.pid=(SELECT pid FROM pidmap WHERE pname=b.pname);

The problem comes from an unintentional mistake: using the table
pidmap in the sub-query, rather than pnamemap.

Running the sub-query on its own gives (as expected):

SELECT pid FROM pidmap;
Error: no such column: pid

This also results in an error:

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  AND a.pid=(SELECT xyz FROM pidmap WHERE pname=b.pname);
Error: no such column: xyz

And, of course, correcting the original query works as expected:

SELECT a.ROWID,b.ROWID FROM a
  INNER JOIN b ON a.tid=b.tid
  AND a.pid=(SELECT pid FROM pnamemap WHERE pname=b.pname);

Seems to me that sqlite is resolving the column name from the incorrect
scope in the sub-query?

I have observed this in v3.7.10, but have also tested it in v3.8.8.3.

Andy



[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Richard Hipp
On 3/3/15, Jan Asselman  wrote:
>
> Can I copy the "statistics tables" from one database file to another?
>

Yes.  You have to run "ANALYZE sqlite_master;" first to actually
create the tables, but then you can populate the tables with data
copied from a different database.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] PRAGMA Synchronous safety

2015-03-03 Thread Richard Hipp
On 3/2/15, Doug Nebeker  wrote:
>
>> Depends on whether you are using WAL mode or not.
>
> I am not using WAL.
>

Then there is little performance impact from using PRAGMA
synchronous=FULL versus PRAGMA synchronous=NORMAL.  Both should work.
I do not understand why you are seeing errors.  Have you looked at
https://www.sqlite.org/howtocorrupt.html to see if that offers any
clues?
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Sqlite and threads/multiple applications

2015-03-03 Thread Olivier Vidal

ok, thank you all!
> Will Fong 
> 2 mars 2015 01:23
> Hi Olivier,
>
> As other people have already mentioned, the operating system doesn't
> matter. Pick the one you're most comfortable or interest in using. The
> most reliable or secure OS is worthless if you don't care or know how
> to use it.
>
> What I believe is even more important (in terms of reliability) than
> the OS is the underlying hardware. Namely because no one ever
> considers hardware. This is true for all databases and even all
> software. If the hardware says "yeah, I fysnc'ed!", it's not the OS's
> fault if there was a file corruption due to power failure. These days,
> a lot of people are using cloud infrastructure where they have
> absolutely no control and no idea how things work under the hood.
>
> -w
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> Olivier Vidal 
> 1 mars 2015 10:22
> ok, thank you Simon and Keith!
> other people have experience with multiple applications/one sqlite 
> database?
>
> looking at this list , I 
> wonder if there is an operating system to favour with Sqlite. I saw 
> that there could be problems with older versions of Windows or Linux. 
> But take the example of a web server, with as the latest version of 
> Windows Server 2012 or the latest version of Ubuntu. What would be the 
> most reliable operating system for Sqlite(last version)?
>
> olivier
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] System.Data.SQLite.DLL version 1.0.95.0 Bug

2015-03-03 Thread Joe Mistachkin

Michal wrote:
>
> Thank you for looking at the issue.  I am not sure what to do with
> the link supplied below, I think that is way above my skill level.
>

The fix will be included in the next release of System.Data.SQLite.

--
Joe Mistachkin



[sqlite] full table scan ignores PK sort order?

2015-03-03 Thread Richard Hipp
On 3/3/15, Jan Asselman  wrote:
>
> Most of my queries are in the form
> "SELECT * FROM test WHERE a == ? AND b < ?;"
> and use the primary key index so that the rows are returned in the expected
> order without using the ORDER BY statement.

Do not rely on this behavior!  It might change at any moment!

If you omit the ORDER BY clause, the database engine is free to return
rows in any order it chooses.  SQLite sometimes uses this freedom to
choose non-intuitive query plans that run faster.  It might use this
freedom even more in the future, thus breaking your application if you
omit the ORDER BY clause.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] System.Data.SQLite.DLL version 1.0.95.0 Bug

2015-03-03 Thread Michal
Hi,

I use SQLite to drive a racing database since 2012 coded in VB.net. In that 
time I have always used the sqlite-netFx40-binary-bundle-Win32-2010-x.x.xx.x 
zip and then used the System.Data.SQLite.dll. I have used this process from 
version 1.0.81.0 to 1.0.94.0. (Im not sure if there is anything inherently 
incorrect with using that DLL so I am mentioning it)

I downloaded the System.Data.SQLite (1.0.95.0) yesterday. Upon implementation 
of this DLL my program fails, with error ?Table doesn't have a Primary Key? 
when I try finding a row on the table. (using this code Dim Findrow as datarow 
= Inforows.Rows.Find(?Sample1?)

When I revert to 1.0.94.0 there is no issue and the program works as expected 
and as it has been these last 3 years using the different versions outlined 
above. Nothing else is changed other then the program reference of the 
different version of the DLL. I can go back and forth; from 94 up to 95 fails 
with the error. Going Back down, from 95 to 94 is OK. The table does have a 
Primary Key. The data IS being filled into the table and has correct schema 
types. 

My intent is to be able to add temporary columns to a Data Table ( a blank 
Alias column) within the program without changing the actual Database structure.

Below is the code used to fill the data table.

Dim TodaysHorses as New DataTable

Using oMainQueryR As New SQLite.SQLiteCommand
   oMainQueryR.CommandText = ?SELECT HorseID, RaceID, Null as 
'ISP', Comments FROM InfoRows WHERE RDate BETWEEN DATE(:StartDate) AND 
DATE(:EndDate) ORDER BY DATETIME([Jump]) Asc, RaceID, TABno?

   oMainQueryR.Parameters.AddWithValue(":StartDate", 
CDate(StartDate).ToString("-MM-dd"))
   oMainQueryR.Parameters.AddWithValue(":EndDate", 
CDate(EndDate).ToString("-MM-dd"))
   Using connection As New 
SQLite.SQLiteConnection(R2W_conectionString)
   Using oDataSQL As New SQLite.SQLiteDataAdapter
   oMainQueryR.Connection = connection
   oDataSQL.SelectCommand = oMainQueryR
   connection.Open()
   oDataSQL.FillSchema(TodaysHorses, SchemaType.Source)
   oDataSQL.Fill(TodaysHorses)
   connection.Close()
   End Using
   End Using
   End Using

Here is the code that casts the error

Dim Findrow as datarow = Inforows.Rows.Find(?Sample1?)


The actual SQL Query created by the above code is

SELECT HorseID, RaceID, Null as 'ISP', Comments FROM InfoRows WHERE RDate 
BETWEEN DATE(:StartDate) AND DATE(:EndDate) ORDER BY DATETIME([Jump]) Asc, 
RaceID, TABno

The program fails even with this SQL query
SELECT HorseID, RaceID, Null as 'ISP', Comments FROM InfoRows 


If the code is changed to use the following SQL Query the program works OK with 
both versions of the dll.

SELECT HorseID, RaceID, Comments FROM InfoRows 


The issue is in the Query itself, where the use of  Null AS ?Alias_Name? 
delivers a table without the Primary Key being designated. As I said before the 
data is filled correctly even into the Primary Key column. When the Alias 
column is removed the program works as normal. If the alias column is created 
using an existing column that also works OK.
Example : Select HorseID, RaceID, Comments AS ?NewCol?, Comments . This SQL 
string works.

Kind Regards

Michal


[sqlite] List duplication

2015-03-03 Thread David Woodhouse
On Mon, 2015-03-02 at 21:10 -0600, Mike Owens wrote:
> The problem is that this is the very bone of contention in the reply-to
> religious war.

Religious as in there are strongly-held beliefs on both sides, but only
one is really based in logic and common sense? :)

>  Is it not? I may be wrong, but I thought this is the very
> setting that people get so defensive about changing.

Yes, it's Reply-To: that people get defensive about changing, but...

>  As we have it now,
> people have a suitable default pointing back to the (correct) list but also
> the freedom to change the reply-to header should they want to. If we strip
> the reply-to header in order to correct for the problematic MUA's, then the
> latter freedom is lost. And if I remember correctly, some people get very
> angry about this.

... that isn't really the point. People *do* talk about 'freedom', but
it's the freedom to have their mail client actually do as they ask it. 

A mail client has a private 'Reply' button, and a public 'Reply All'
button. Each has a clear and simple function.

If the list abuses the Reply-To: field, that overrides the behaviour of
the private Reply button, hijacking it to send a public reply to the
list. So the recipients' freedom is taken away from them ? it's like
breaking into their computer and hacking their mail client so that
*both* buttons do the same thing ? occasionally leading to a *highly*
embarrassing event when private emails are accidentally sent to the
list.

If the list *doesn't* abuse the Reply-To: field, then the mail client
does the right thing. The private Reply, and the public Reply All
buttons, both do precisely what they should.

The main reason people advocate for munging Reply-To: is usually that
"list members are too dim to press the right button".

Which aside from being fairly insulting, is kind of a self-fulfilling
prophecy ? because for those who genuinely are that unsophisticated, by
hacking their mail clients to behave inconsistently, you actually
*reduce* their chances of understanding it all. The private/public reply
button concept is *so* simple, that I'd suggest the *main* reason people
have problems with it is because of misguided Reply-To: headers changing
the behaviour and making it inconsistent.

It should also be noted that the failure mode when you do the hack can
sometimes be catastrophic, as public postings can never be undone. The
failure mode if someone accidentally hits the wrong button, as well as
being entirely their own fault, is very minor ? all they need to do is
resend the message.

And in the general case, if someone isn't *even* paying enough attention
to press the right button, sometimes there is *benefit* in having their
messages not reach the list until the coffee has kicked in and they're a
little more awake. They can edit the message before they resend it :)

There is little benefit in a Reply-To: header, and it is actually
*counter-productive* in the long run for the one case where it has any
dubious logic at all.

http://www.unicom.com/pw/reply-to-harmful.html
http://david.woodhou.se/reply-to-list.html

-- 
dwmw2



[sqlite] List duplication

2015-03-03 Thread R.Smith


On 2015-03-03 01:15 AM, Mike Owens wrote:
> For clarity, here is the currently policy for Reply-to as it is set in
> Mailman:
>
>
> - Should any existing Reply-To: header found in the original message be
> stripped? If so, this will be done regardless of whether an explict
> Reply-To: header is added by Mailman or not. : *NO*
> - Where are replies to list messages directed? *This list*
> - Explicit Reply-To: header : *None*

Ah, thank you, all makes sense now. If you change the first option to 
YES then nobody else's quirky reply-to headers will get into the list, 
and the second option remains as is (it should be setting the standard 
@mailinglists reply-to field) - this should solve the duplication issue, 
but if it is disagreeable to anyone, more consideration is needed.

Thanks again Mike for the list maintenance and the quick replies!
Ryan


>
>
> On Mon, Mar 2, 2015 at 5:04 PM, R.Smith  wrote:
>
>>
>> On 2015-03-03 12:42 AM, Darren Duncan wrote:
>>
>>> I think that what needs to be done is for each foo at sqlite.org to return
>>> an error/undeliverable message if someone sends a message to it, citing
>>> that all messages must be explicitly sent to the corresponding
>>> foo at mailinglists.sqlite.org.  That should handily solve the problem. --
>>> Darren Duncan
>>>
>> I see where you are coming from, but if the Reply-To field contains 2
>> email addresses and then the server penalizes you for using one of them,
>> that might go down in history as the most-evil mailing-list quirk of all
>> time.
>>
>> As to Mike's post - the dual mailing-list's reason for being is very clear
>> and welcomed, no qualms there, just the Reply-to duplication that is
>> quirky. I read all forum emails, I do not recall any multi-person decision
>> to add this dual Reply-To thing, however memory is not my strength so I'm
>> happy with the explanation - but I am wondering - is this done and dusted?
>> Is there any chance we might re-open the discussion now that real-World
>> scenarios have set in?
>>
>> It's an extremely minor irritation and will cause a few extra mail-traffic
>> items at its worst - the only real casualty being my OCD, but I can't help
>> thinking there is not a single good reason to keep the situation (unless
>> someone can show the opposite).
>>
>>
>>
>>
>>> On 2015-03-02 10:37 AM, Mike Owens wrote:
>>>
 For what it is worth, the move to mailinglists.sqlite.org is a result of
 the Mailman web interface having to be hosted under the following two
 constraints:

 1. It must be on port 80
 2. It cannot be on sqlite.org port 80

 I explained this reasoning in a previous email. The short version is
 because we are using two web servers on the VM that hosts both the
 sqlite.org website and fossil repos (althttpd) and the Mailman web
 interface (Apache). We previously did this on a single IP where mailman
 was
 on port 8080. However, we had a significant number of complaints from
 people who could not reach the Mailman web interface via sqlite.org:8080
 due to firewall restrictions in their respective locations. So we did
 what
 we could to move it to port 80.

 So to satisfy these two constraints, mailinglists.sqlite.org was born.
 Unless somebody else knows better, Mailman does not allow one to use two
 domains for a given list. Either something will screw up with the mail
 routing or in the web interface if you try to use more than one. You have
 to pick one domain and stick with it. Thus I could not continue to
 support
 both the previous sqlite.org (:8080) domain and the new
 mailinglists.sqlite.org (:80) for the users list. So I made the move
 from
 the one to the other.

 Regarding the reply-to policy. I honestly don't remember the reasoning
 behind it. I know there was a big long discussion about it in the past
 (search the list) and after the dust settled we chose the current policy
 and that is the way it is configured today.  I do believe the policy was
 a
 result of the consensus of the mailing list users. I can say that we do
 everything we can to make most of the people happy most of the time. That
 is the very reason we made this change to begin with -- to make it
 possible
 for everyone to use the list. It would have been easier to just keep
 things
 the same and let the people who can't reach port 8080 deal with it, but
 we
 did what we had to to make it accessible for them as well. There are a
 lot
 of variables in the system and we juggle them as best we can.

 Any feedback or suggestions are always welcome.


 On Mon, Mar 2, 2015 at 5:18 AM, David Woodhouse 
 wrote:

   On Mon, 2015-03-02 at 12:45 +0200, R.Smith wrote:
>> Ok, I've found the source of the list duplications.
>>
>> Some emails (Such as the one by J.K. Lowden 2-March-2015 re: Characters

[sqlite] List duplication

2015-03-03 Thread R.Smith


On 2015-03-03 12:42 AM, Darren Duncan wrote:
> I think that what needs to be done is for each foo at sqlite.org to 
> return an error/undeliverable message if someone sends a message to 
> it, citing that all messages must be explicitly sent to the 
> corresponding foo at mailinglists.sqlite.org.  That should handily solve 
> the problem. -- Darren Duncan

I see where you are coming from, but if the Reply-To field contains 2 
email addresses and then the server penalizes you for using one of them, 
that might go down in history as the most-evil mailing-list quirk of all 
time.

As to Mike's post - the dual mailing-list's reason for being is very 
clear and welcomed, no qualms there, just the Reply-to duplication that 
is quirky. I read all forum emails, I do not recall any multi-person 
decision to add this dual Reply-To thing, however memory is not my 
strength so I'm happy with the explanation - but I am wondering - is 
this done and dusted? Is there any chance we might re-open the 
discussion now that real-World scenarios have set in?

It's an extremely minor irritation and will cause a few extra 
mail-traffic items at its worst - the only real casualty being my OCD, 
but I can't help thinking there is not a single good reason to keep the 
situation (unless someone can show the opposite).


>
> On 2015-03-02 10:37 AM, Mike Owens wrote:
>> For what it is worth, the move to mailinglists.sqlite.org is a result of
>> the Mailman web interface having to be hosted under the following two
>> constraints:
>>
>> 1. It must be on port 80
>> 2. It cannot be on sqlite.org port 80
>>
>> I explained this reasoning in a previous email. The short version is
>> because we are using two web servers on the VM that hosts both the
>> sqlite.org website and fossil repos (althttpd) and the Mailman web
>> interface (Apache). We previously did this on a single IP where 
>> mailman was
>> on port 8080. However, we had a significant number of complaints from
>> people who could not reach the Mailman web interface via sqlite.org:8080
>> due to firewall restrictions in their respective locations. So we did 
>> what
>> we could to move it to port 80.
>>
>> So to satisfy these two constraints, mailinglists.sqlite.org was born.
>> Unless somebody else knows better, Mailman does not allow one to use two
>> domains for a given list. Either something will screw up with the mail
>> routing or in the web interface if you try to use more than one. You 
>> have
>> to pick one domain and stick with it. Thus I could not continue to 
>> support
>> both the previous sqlite.org (:8080) domain and the new
>> mailinglists.sqlite.org (:80) for the users list. So I made the move 
>> from
>> the one to the other.
>>
>> Regarding the reply-to policy. I honestly don't remember the reasoning
>> behind it. I know there was a big long discussion about it in the past
>> (search the list) and after the dust settled we chose the current policy
>> and that is the way it is configured today.  I do believe the policy 
>> was a
>> result of the consensus of the mailing list users. I can say that we do
>> everything we can to make most of the people happy most of the time. 
>> That
>> is the very reason we made this change to begin with -- to make it 
>> possible
>> for everyone to use the list. It would have been easier to just keep 
>> things
>> the same and let the people who can't reach port 8080 deal with it, 
>> but we
>> did what we had to to make it accessible for them as well. There are 
>> a lot
>> of variables in the system and we juggle them as best we can.
>>
>> Any feedback or suggestions are always welcome.
>>
>>
>> On Mon, Mar 2, 2015 at 5:18 AM, David Woodhouse  
>> wrote:
>>
>>> On Mon, 2015-03-02 at 12:45 +0200, R.Smith wrote:
 Ok, I've found the source of the list duplications.

 Some emails (Such as the one by J.K. Lowden 2-March-2015 re: 
 Characters
 corrupt after importing...) contains a "Reply-To" field in the header
 with both list addresses which must have sneaked in there due to some
 automatic list feature.  (By "Both" I mean the old:
 sqlite-users at sqlite.org and the new:
>>> sqlite-users at mailinglists.sqlite.org)
>>>
>>> You don't need that, do you? Just hitting Reply All to a message which
>>> is:
>>>   To: sqlite-users at sqlite.org
>>>   Reply-To: sqlite-users at mailinglists.sqlite.org
>>>
>>> would generate a message which ends up going to both, wouldn't it?
>>>
>>> (I can't easily test; I've configured my mailer to ignore abusive
>>> Reply-To: headers from mailing lists where it can detect them, so my
>>> Reply and Reply All buttons actually do what I *ask* them to.)
>>>
>>> But looking at the first message in the 'PhD Student' thread, it 
>>> appears
>>> just as in my example above. And John KcKown's response of 26 Feb 2015
>>> 07:16:47 -0600 is indeed to both addresses, as if he'd done the correct
>>> thing and simply hit 'Reply All'.
>>>
 I usually use the "Reply to List" button