Re: [sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

2013-11-27 Thread Tristan Van Berkom
On Thu, 2013-11-28 at 00:20 -0500, Igor Tandetnik wrote:
> On 11/27/2013 11:52 PM, Tristan Van Berkom wrote:
> > 
> > SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary
> > LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list
> >  ON phone_list.uid = summary.uid
> > LEFT OUTER JOIN 'folder_id_email_list' AS email_list
> >  ON email_list.uid = summary.uid
> > WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505')
> >AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%')
> 
> Why are you using outer joins when your WHERE clause discards unmatched 
> records anyway? If you replace LEFT OUTER with INNER, the end result 
> would be exactly the same.
> 
> I have a strong feeling that, once you replace outer joins with regular 
> joins, this statement would run just as fast as your convoluted one with 
> nested selects. By using outer joins, you prevent SQLite from reordering 
> the conditions and using the most efficient search strategy - it is 
> forced to perform those joins left to right, which results in bad 
> performance because, apparently, you don't have indexes on 
> phone_list.uid or email_list.uid.

Actually... I just replied to this email but forgot to mention, while
your input is really appreciated, it would be really great if you could
answer the actual question :)

I.e. is the statement logically the same ?

> 
> Try this straightforward query:
> 
> SELECT DISTINCT summary.uid, summary.vcard FROM folder_id AS summary
> JOIN folder_id_phone_list AS phone_list
>  ON phone_list.uid = summary.uid
> JOIN folder_id_email_list AS email_list
>  ON email_list.uid = summary.uid
> WHERE phone_list.value LIKE '%0505'  AND email_list.value LIKE 'eddie%';
> 
> > o Leverage the index which I've created on 'folder_id_email_list'
> >   (I am using case insensitive LIKE statements so the indexes
> >   work in that statement).
> 
> Normally, you need case-sensitive LIKE in order to use the index, unless 
> the index is created with COLLATE NOCASE. You could use EXPLAIN QUERY 
> PLAN to confirm that the index is indeed being utilized.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

2013-11-27 Thread Tristan Van Berkom
On Thu, 2013-11-28 at 00:20 -0500, Igor Tandetnik wrote:
> On 11/27/2013 11:52 PM, Tristan Van Berkom wrote:
> > 
> > SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary
> > LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list
> >  ON phone_list.uid = summary.uid
> > LEFT OUTER JOIN 'folder_id_email_list' AS email_list
> >  ON email_list.uid = summary.uid
> > WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505')
> >AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%')
> 
> Why are you using outer joins when your WHERE clause discards unmatched 
> records anyway? If you replace LEFT OUTER with INNER, the end result 
> would be exactly the same.

I was afraid someone might ask this, but I had already put a lot
of detail into this email so I left it out.

When using an INNER join, the engine does something like this:

  o Create a data set that is table_1 * table_2 * table_3 rows
large

  o Run the constraints on what might be multiple matching rows
in the resulting huge data set (even if I nest the selects,
there can be other constraints to sort out on the main table).

This is extremely slow for addressbooks with 200,000 contacts,
however the INNER join allows me to drop the nested select foolery
for smaller addressbooks as the indexes can still be leveraged
in a query similar to the first version above.

This bug comment has a good detailed description of the reason
why we shifted from regular joins to LEFT OUTER joins:
https://bugzilla.gnome.org/show_bug.cgi?id=699597#c6


> I have a strong feeling that, once you replace outer joins with regular 
> joins, this statement would run just as fast as your convoluted one with 
> nested selects. By using outer joins, you prevent SQLite from reordering 
> the conditions and using the most efficient search strategy - it is 
> forced to perform those joins left to right, which results in bad 
> performance because, apparently, you don't have indexes on 
> phone_list.uid or email_list.uid.

Indeed I avoided creating an index for phone_list.uid and
email_list.uid, and indeed I'm unsure of how that will effect
performance.

If I were to create indexes on the uid column of the auxiliary
tables, would that cause the INNER join to not create such a
huge dataset before checking the constraints ?

> Try this straightforward query:
> 
> SELECT DISTINCT summary.uid, summary.vcard FROM folder_id AS summary
> JOIN folder_id_phone_list AS phone_list
>  ON phone_list.uid = summary.uid
> JOIN folder_id_email_list AS email_list
>  ON email_list.uid = summary.uid
> WHERE phone_list.value LIKE '%0505'  AND email_list.value LIKE 'eddie%';

This looks stunningly similar to what we had originally, which indeed
returns results in less than 10ms for addressbooks up to around 800
contacts.

This graph shows performance of INNER joins actually:
http://blogs.gnome.org/tvb/files/2013/01/filter-by-long-phone-number-prefix.png

As you can see, things start to get bad with > 800 contacts (the red
line is what we're looking at here).

> 
> > o Leverage the index which I've created on 'folder_id_email_list'
> >   (I am using case insensitive LIKE statements so the indexes
> >   work in that statement).
> 
> Normally, you need case-sensitive LIKE in order to use the index, unless 
> the index is created with COLLATE NOCASE. You could use EXPLAIN QUERY 
> PLAN to confirm that the index is indeed being utilized.

Oh sorry, I totally mistyped that, LIKE is case insensitive by default
and we override that indeed, using "PRAGMA case_sensitive_like=ON" at
initialization time.

Thank you for taking the time to try and understand this with me :)

Cheers,
-Tristan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

2013-11-27 Thread Igor Tandetnik

On 11/27/2013 11:52 PM, Tristan Van Berkom wrote:


SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary
LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list
 ON phone_list.uid = summary.uid
LEFT OUTER JOIN 'folder_id_email_list' AS email_list
 ON email_list.uid = summary.uid
WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505')
   AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%')


Why are you using outer joins when your WHERE clause discards unmatched 
records anyway? If you replace LEFT OUTER with INNER, the end result 
would be exactly the same.


I have a strong feeling that, once you replace outer joins with regular 
joins, this statement would run just as fast as your convoluted one with 
nested selects. By using outer joins, you prevent SQLite from reordering 
the conditions and using the most efficient search strategy - it is 
forced to perform those joins left to right, which results in bad 
performance because, apparently, you don't have indexes on 
phone_list.uid or email_list.uid.


Try this straightforward query:

SELECT DISTINCT summary.uid, summary.vcard FROM folder_id AS summary
JOIN folder_id_phone_list AS phone_list
ON phone_list.uid = summary.uid
JOIN folder_id_email_list AS email_list
ON email_list.uid = summary.uid
WHERE phone_list.value LIKE '%0505'  AND email_list.value LIKE 'eddie%';


o Leverage the index which I've created on 'folder_id_email_list'
  (I am using case insensitive LIKE statements so the indexes
  work in that statement).


Normally, you need case-sensitive LIKE in order to use the index, unless 
the index is created with COLLATE NOCASE. You could use EXPLAIN QUERY 
PLAN to confirm that the index is indeed being utilized.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Need advice on using nested selects in JOIN statements as a logical 'AND'

2013-11-27 Thread Tristan Van Berkom
Hi,
   I don't have many years experience with the SQL language
and I've cooked up some pretty complex stuff which will run
in production environments, I just want to confirm with you
that the assumptions I've made are true (I do have a lot of
unit tests which confirm that my code works as far as I can 
see).

Some minimum context is required first:

  o My software is a C library which stores an addressbook
of vCards

  o Most of the searchable contact fields are stored in
one table, with a PRIMARY KEY column 'uid'

  o As the vCard spec specifies many values can be
listed for a given key, for instance "TEL" or "EMAIL",
these are stored for the purpose of quick searches
on a separate table (one separate table per mult-valued
datatype).

Each of these tables has a 'uid' column which is
created with:
  'uid TEXT NOT NULL REFERENCES the_main_table (uid)'

So for the table which stores emails, I have one row
per email address for each contact (multiple rows can
exist with the same 'uid');

To speed up searches, I have indexes created on some
of these auxiliary tables.

  o I need to honor a predetermined contact search API,
searches can come in for various contact fields at
a time, for instance:

  "Give me all the contacts who's family name starts
   with J, and only if they have an email address
   ending with .com"

Is represented as:
(and (beginswith "family_name" "J")
 (endswith "email" ".com"))

Now the fun starts, I've recently written a query optimizer
of sorts which reorganizes how I will form my query before
generating the SQL to execute.

So let's consider the query input:

   (and (endswith "tel" "0505") (beginswith "email" "eddie"))

Note that in this case, the three tables in context are:
   o folder_id  The main table with unique 'uid'
   o folder_id_phone_list   The table containing all phone numbers
   o folder_id_email_list   The table containing all phone numbers

Before the query optimization, my query would be:

SELECT DISTINCT summary.uid, summary.vcard FROM 'folder_id' AS summary
LEFT OUTER JOIN 'folder_id_phone_list' AS phone_list 
ON phone_list.uid = summary.uid
LEFT OUTER JOIN 'folder_id_email_list' AS email_list 
ON email_list.uid = summary.uid
WHERE (phone_list.value IS NOT NULL AND phone_list.value LIKE '%0505')
  AND (email_list.value IS NOT NULL AND email_list.value LIKE 'eddie%')

After optimization, will be generated instead like so:

SELECT DISTINCT summary.uid, summary.vcard FROM (
 SELECT DISTINCT phone_list.uid FROM 'folder_id_phone_list' 
 AS phone_list
 WHERE (phone_list.value IS NOT NULL AND 
phone_list.value LIKE '%0505') 
) AS phone_results
LEFT OUTER JOIN (
 SELECT DISTINCT email_list.uid FROM 'folder_id_email_list'
 AS email_list
 WHERE (email_list.value IS NOT NULL AND
   email_list.value LIKE 'eddie%') 
) AS email_list_results ON email_results.uid = phone_results.uid 
LEFT OUTER JOIN 'folder_id' AS summary 
  ON summary.uid = email_results.uid WHERE summary.uid IS NOT NULL

Performing the nested selects achieves two things:

   o Reduce the number of rows considered in the JOIN statements

   o Leverage the index which I've created on 'folder_id_email_list'
 (I am using case insensitive LIKE statements so the indexes
 work in that statement).

The main assumption that I'm making, is the toplevel logical AND
statements (i.e.  AND ),
can be transformed into a nested SELECT with the LEFT OUTER JOIN
each time "ON this_select.uid = previous_select.uid", effectively
filtering out any rows which didn't match either of the
constraints.

As far as I can tell, and I've been trying to absorb these
JOIN concepts to the best of my ability... the above two
statements will always report the same results, i.e. they
are equal statements for all practical purposes, only
the second one is much faster than the first one.

And while my test cases tell me so far that I am correct
in my assumptions, I'd really like to have some insight
from a third party authority on the matter.

Can someone please either confirm that my logic is sound
in this assumption ? Or point out to me perhaps some case
where my logic might break down ?

Please excuse the long detailed email, I don't think I could
have explained this accurately without providing enough
context.

Kind Regards,

-Tristan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [ANNOUNCEMENT] Updated: sqlite3-3.8.2-1 (beta, as test package) for Cygwin/Cygwin64

2013-11-27 Thread Jan Nijtmans
SQLite is a software library that implements a self-contained,
serverless, zero-configuration, transactional SQL database engine

Changes since 3.8.1-1
=
* Updated to upstream 3.8.2 beta release. Main feature:
  * Added support for WITHOUT ROWID tables.
* The path handing in the win32/win32-longpath VFS is
   corrected w.r.t. UTF-8 and db handle sharing. Since
   "winnt" is the default VFS, no-one should be affected
   by that.

Since this is a test package, this beta release is not installed
automatically when running setup.exe. You must explicitly
specify version 3.8.2-1!

--
Jan Nijtmans
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Visual Studio 2013 Issue - GetVersionEx deprecated

2013-11-27 Thread Jan Nijtmans
2013/11/4 Jan Nijtmans:
> 2013/11/4 Simon Slavin:
>> On 4 Nov 2013, at 6:32am, David de Regt  wrote:
>>> In Visual Studio 2013, which uses the Windows 8.1 Platform SDK, they've 
>>> marked GetVersionEx as deprecated, trying to supercede it through to 
>>> VerifyVersionInfo and some other hardcoded macros based on that call that 
>>> the new SDK implements.  Looking at the SQLite source, it looks like it 
>>> only uses the GetVersionEx call to test for whether LockFileEx is supported 
>>> on the current OS (whether it's NT-based or not).
>>
>> Could you suggest equivalent code which would use VerifyVersionInfo instead 
>> ?  Even a rough guide on where to find documentation on it would be useful.

Simon, Your request has been granted:

(even though it just missed the beta)

Regards,
  Jan Nijtmans
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Failed test on aarch64

2013-11-27 Thread Dan Kennedy

On 11/27/2013 04:26 PM, Zsbán Ambrus wrote:

On Tue, Nov 26, 2013 at 5:01 PM, Richard Hipp  wrote:

The check-in above changes the behavior of REAL-to-INTEGER casts where the
REAL value is larger than the largest possible integer.  For example:

 SELECT CAST(9223372036854775808.0 to INTEGER);

The change causes the query to yield
9223372036854775807,

What is the scope of this change?
http://www.sqlite.org/draft/releaselog/current.html mentions only CAST
expressions, is CAST expressions the only case when this happens?  In
particular, will the behavior of the sqlite3_column_int64 function
when retrieving a float value result change?


Yes. It will.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Failed test on aarch64

2013-11-27 Thread Zsbán Ambrus
On Tue, Nov 26, 2013 at 5:01 PM, Richard Hipp  wrote:
> The check-in above changes the behavior of REAL-to-INTEGER casts where the
> REAL value is larger than the largest possible integer.  For example:
>
> SELECT CAST(9223372036854775808.0 to INTEGER);
>
> The change causes the query to yield
> 9223372036854775807,

What is the scope of this change?
http://www.sqlite.org/draft/releaselog/current.html mentions only CAST
expressions, is CAST expressions the only case when this happens?  In
particular, will the behavior of the sqlite3_column_int64 function
when retrieving a float value result change?

Ambrus
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-27 Thread Luuk

On 27-11-2013 03:55, James K. Lowden wrote:

On Mon, 25 Nov 2013 19:44:15 +0100
Luuk  wrote:


On 25-11-2013 13:41, Simon Slavin wrote:

I'm wondering whether there's an argument that it should be
evaluated just once for a transaction.


I'm still thinking about this question,

i can think of some benefits if its evaluated just once per
transaction.

but i hope anyone has some examples why its better to evalutate it
once per statement.


The easy answer is: that's how SQL is defined.

A more complicated answer is that, inside a user-defined transaction,
you have control over the meaning of "now".  You can insert "now" into
a table and re-use it as often as you like, without fear of it being
changed by another process.  If, on the other hand, you'd like to
execute several statements in a transaction, you might also like to
know when "now" is for each one.  The per-statement definition of "now"
supports that use.

There's actually nothing special about "now".  Other functions, e.g.
changes(), may vary between statements in a transaction.  (It happens
SQLite doesn't have many such.)  You really just want to preserve the
function's defintion as "returns correct output each time it's
called".

HTH.



ok, thanks for the answer.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] count (*) performance

2013-11-27 Thread Hick Gunter
It calls the xBestIndex method instead.

However, the virtual table module cannot determine which (if any) fields will 
be selected and must act solely upon the passed constraints. Unless you are 
willing to implement a virtual field and write something like "SELECT  
FROM vtable WHERE  AND __fieldlist='';" then the virtual 
table implementation is stuck with retrieving all fields of the base native 
table (or at least sufficient data to be able to generate all fields of the 
virtual table). Alternatively it is sometimes possible to implement __image 
(memory address of record) and __column (Entrypoint for calling the xColumn 
method) virtual fields to generate column values "as needed", but care must be 
taken to ensure that memory addresses remain valid (e.g. if the base virtual 
table has a single record buffer for reading from a file, then all addresses 
will be identical; but the contents reflect the last record read if the 
statement requires a sort operation).

-Ursprüngliche Nachricht-
Von: Dominique Devienne [mailto:ddevie...@gmail.com]
Gesendet: Mittwoch, 27. November 2013 09:45
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] count (*) performance

On Wed, Nov 27, 2013 at 9:35 AM, Clemens Ladisch  wrote:

> veeresh kumar wrote:
> > I see that in release history for 3.8.1 below item.
> > "Estimate the sizes of table and index rows and use the smallest
> applicable B-Tree for full scans and "count(*)" operations."
> >
> > Does it mean that performance of count(*) has been improved in 3.8.1
>
> Yes.
>
> > and if yes by how much?
>
> That depends on how much the rows of your smallest index are smaller
> than the rows of the table.
>

What about vtables?

How does SQLite estimate the size of table and index rows for vtables?

Sometimes vtables are based on native containers which know their size, thus 
it's an O(1) op, but the vtable has no way to communicate that size to the 
engine to bypass an O(N) full scan? Any chance the vtable API would allow such 
shortcuts?

Thanks, --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] count (*) performance

2013-11-27 Thread Dominique Devienne
On Wed, Nov 27, 2013 at 9:35 AM, Clemens Ladisch  wrote:

> veeresh kumar wrote:
> > I see that in release history for 3.8.1 below item.
> > "Estimate the sizes of table and index rows and use the smallest
> applicable B-Tree for full scans and "count(*)" operations."
> >
> > Does it mean that performance of count(*) has been improved in 3.8.1
>
> Yes.
>
> > and if yes by how much?
>
> That depends on how much the rows of your smallest index are smaller
> than the rows of the table.
>

What about vtables?

How does SQLite estimate the size of table and index rows for vtables?

Sometimes vtables are based on native containers which know their size,
thus it's an O(1) op, but the vtable has no way to communicate that size to
the engine to bypass an O(N) full scan? Any chance the vtable API would
allow such shortcuts?

Thanks, --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] count (*) performance

2013-11-27 Thread Clemens Ladisch
veeresh kumar wrote:
> I see that in release history for 3.8.1 below item.
> "Estimate the sizes of table and index rows and use the smallest applicable 
> B-Tree for full scans and "count(*)" operations."
>
> Does it mean that performance of count(*) has been improved in 3.8.1

Yes.

> and if yes by how much?

That depends on how much the rows of your smallest index are smaller
than the rows of the table.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users