Re: [sqlite] Two equivalent queries with very different perf

2014-05-25 Thread RSmith


On 2014/05/26 00:22, Andrew Arnott wrote:

I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
version and upon realizing perf was *really* bad I experimented with
different forms and discovered the fast version worked great. On Windows
Phone 8.1, the slow one took ~20 seconds to execute while the fast one was
well under a second. The "Attachment" table contains binary data that can
be over a MB in size.

It isn't clear why the slow one would be so much slower. In my naïve
opinion the query analyze in SQLite should recognize and treat these two as
the same but obviously the execution strategy is vastly different. Is there
an opportunity here to optimize SQLite for the slow query form?

var fast = await this.Database.QueryAsync(@"
SELECT DISTINCT m.Id FROM Message m
INNER JOIN Attachment a ON a.MessageId = m.Id
WHERE m.RemoteContactId = ?
",
new object[] { this.RemoteParty.Id });


var slow = await this.Database.QueryAsync(@"
SELECT m.Id FROM Message m
WHERE m.RemoteContactId = ? AND (
SELECT COUNT(Id) FROM Attachment a
WHERE a.MessageId = m.Id
) > 0", new object[] { this.RemoteParty.Id });



The slow one will be slow on any system. You are essentially having the Query Engine compile a list of Id's from a table and upon 
each iteration which satisfies the first (Left) part of the statement has to then perform a sub-Query by going through an entire 
second table every time and counting up all the instances of a.MessageId=m.Id and then comparing the results (which may be in the 
thousands for all we know) to zero. This can never be fast if the second table is really large and the looked-up field is not indexed.


Contrast the first (fast) query which requires compiling again a list of IDs, but this time by means of a simple Join which will 
have a simple lookup on the RemoteContactId (quite possibly a primary key or some other form of index) which would be very fast to 
get to. No iterating the entire list to get counts. A better second query would be one using EXIST in stead of COUNT()>0 which 
essentially does the same thing, but EXISTS would look only for 1 candidate and not need to count them all.


To be clear - this can only ever matter if the second table is huge, or at least contains very many messages (or records) with the 
same RemoteContactId.


To answer your specific question though - it isn't clear why the one is so much slower, the only plausible explanation is a really 
large second table dataset that is unindexed, or if indexed, contains very many similar Remote contact Ids. Then, IF this is the 
case, it has nothing much to do with the "query analyze" (by which I presume you mean the Query planner) has no way of "realizing" 
those are the same, nor any way of improving the lookup since the query itself forces the count, which will be time-consuming in 
this case.


The first one is a better Query in SQL terms too. As an interesting aside, I found that if you can find a way to use GROUP BY in 
stead of DISTINCT (having to tweak the other Selected fields a bit) I get much better query times in SQLite specifically. I think 
this is to do with the way grouping sorts results or looks up existing ones working different to the distinct method, but this is 
just my hypothesis, however the speed increase is real and consistent through versions (since 3.7.7 anyway - I did not test before 
then).



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Two equivalent queries with very different perf

2014-05-25 Thread Keith Medcalf

The queries are identical and you should have similar performance on each 
assuming that you have the correct indexes defined.

I presume you have indexes on Message(RemoteContactID, Id) and 
Attachment(MessageId)?

If you do not, then if you only have indexes on Message(Id), the first will be 
fast(er) for trivial table sizes because it will do a full table scan of the 
attachments table, look up the record (using the index) in Message, apply the 
filter, and return the results.  Performance will deteriorate when you get a 
significant number of records in Attachment.

Conversely, the second (slow) query can only be solved by (without appropriate 
indexes) scanning the Message table and for each hit scanning the Attachment 
table.  Table Scans are a slow process for non-trivial tables (and a 
particularly bad idea in inner-loops).


>-Original Message-
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Andrew Arnott
>Sent: Sunday, 25 May, 2014 16:22
>To: General Discussion of SQLite Database
>Subject: [sqlite] Two equivalent queries with very different perf
>
>I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
>version and upon realizing perf was *really* bad I experimented with
>different forms and discovered the fast version worked great. On Windows
>Phone 8.1, the slow one took ~20 seconds to execute while the fast one
>was
>well under a second. The "Attachment" table contains binary data that can
>be over a MB in size.
>
>It isn't clear why the slow one would be so much slower. In my naïve
>opinion the query analyze in SQLite should recognize and treat these two
>as
>the same but obviously the execution strategy is vastly different. Is
>there
>an opportunity here to optimize SQLite for the slow query form?
>
>var fast = await this.Database.QueryAsync(@"
>   SELECT DISTINCT m.Id FROM Message m
>   INNER JOIN Attachment a ON a.MessageId = m.Id
>   WHERE m.RemoteContactId = ?
>   ",
>   new object[] { this.RemoteParty.Id });
>
>
>var slow = await this.Database.QueryAsync(@"
>   SELECT m.Id FROM Message m
>   WHERE m.RemoteContactId = ? AND (
>   SELECT COUNT(Id) FROM Attachment a
>   WHERE a.MessageId = m.Id
>   ) > 0", new object[] { this.RemoteParty.Id });
>
>
>--
>Andrew Arnott
>"I [may] not agree with what you have to say, but I'll defend to the
>death
>your right to say it." - S. G. Tallentyre
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Two equivalent queries with very different perf

2014-05-25 Thread Simon Slavin

On 25 May 2014, at 11:22pm, Andrew Arnott  wrote:

> It isn't clear why the slow one would be so much slower.

Compare the two query plans using

EXPLAIN QUERY PLAN SELECT ...

with each one:



> Is there
> an opportunity here to optimize SQLite for the slow query form?

First, do ANALYZE, then try EXPLAIN QUERY PLAN again in case the plan changed.

Then create some appropriate indexes and see whether SQLite decides to use one.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Two equivalent queries with very different perf

2014-05-25 Thread Scott Robison
On Sun, May 25, 2014 at 4:22 PM, Andrew Arnott wrote:

> It isn't clear why the slow one would be so much slower. In my naïve
> opinion the query analyze in SQLite should recognize and treat these two as
> the same but obviously the execution strategy is vastly different. Is there
> an opportunity here to optimize SQLite for the slow query form?
>
> var slow = await this.Database.QueryAsync(@"
> SELECT m.Id FROM Message m
> WHERE m.RemoteContactId = ? AND (
> SELECT COUNT(Id) FROM Attachment a
> WHERE a.MessageId = m.Id
> ) > 0", new object[] { this.RemoteParty.Id });
>

I suspect the problem with the slow one is that the subquery has to be
re-run and count the matching rows for every matching RemoteContactId. You
might be able to speed it up with a simple "LIMIT 1" in the sub-query, like
so:  (SELECT COUNT(Id) FROM Attachment a WHERE a.MessageId = m.Id LIMIT 1).

Since you only care if the number is greater than zero, 1 will do just as
well as any other non-zero number.

SDR
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Two equivalent queries with very different perf

2014-05-25 Thread Andrew Arnott
I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
version and upon realizing perf was *really* bad I experimented with
different forms and discovered the fast version worked great. On Windows
Phone 8.1, the slow one took ~20 seconds to execute while the fast one was
well under a second. The "Attachment" table contains binary data that can
be over a MB in size.

It isn't clear why the slow one would be so much slower. In my naïve
opinion the query analyze in SQLite should recognize and treat these two as
the same but obviously the execution strategy is vastly different. Is there
an opportunity here to optimize SQLite for the slow query form?

var fast = await this.Database.QueryAsync(@"
SELECT DISTINCT m.Id FROM Message m
INNER JOIN Attachment a ON a.MessageId = m.Id
WHERE m.RemoteContactId = ?
",
new object[] { this.RemoteParty.Id });


var slow = await this.Database.QueryAsync(@"
SELECT m.Id FROM Message m
WHERE m.RemoteContactId = ? AND (
SELECT COUNT(Id) FROM Attachment a
WHERE a.MessageId = m.Id
) > 0", new object[] { this.RemoteParty.Id });


--
Andrew Arnott
"I [may] not agree with what you have to say, but I'll defend to the death
your right to say it." - S. G. Tallentyre
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple Select from IN - from a newbie.

2014-05-25 Thread Simon Slavin

On 25 May 2014, at 9:23pm, RSmith  wrote:

> On the plus side, they are mostly free and even the most expensive ones 
> doesn't come near what Stephen paid (All I can say to him is: Dude, you wuz 
> robbed).

The rise of the internet has changed the way people learn computer languages.  
With people giving their time on mailing lists like this and writing web pages 
detailing stuff they know, everything is on the web now.  I just did a search 
on 

"SQL tutorial" sqlite

and got 23,000 hits.  Even if only 1% of those is any kind of 'learning SQL 
with sqlite' tutorial information, that's still 230 tutorials.

I used to be a contractor and one of the things I offered companies was 
teaching courses on whatever software they used or security measures and 
protocols they wanted enforced.  I could still do that -- for big companies.  
Most companies like sending their people on courses.  That's just the way big 
companies do things.  But for the twenty million people learning widely-used 
stuff like SQL ?  I couldn't charge 10 Euros each.  They all go to w3schools 
and get decent-quality material free.  Which is exactly what I did to learn it 
myself.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple Select from IN - from a newbie.

2014-05-25 Thread RSmith


On 2014/05/25 19:37, Humblebee wrote:

Thanks for the tip about the online course with an instructor.  I will
look into online courses,  although the 5 digits is a bit outside of
my limited budget.   Learning is generally a very personal thing and
there are many ways to take in information.  I have to say that I
learned so much from the past few days from the amazing help on this
forum and from researching on the internet.  Also since I am only
working with Sqlite and no other databases, this forum has been
nothing less than a gift of grace from above for me.



The help is a pleasure - and Stephen is correct, if this is your work or in some way you are expected to be conversant and help 
people/companies do the SQL thing, you really need to look up a course on relational DB theory and maybe SQL as used by other 
engines too. On the plus side, they are mostly free and even the most expensive ones doesn't come near what Stephen paid (All I can 
say to him is: Dude, you wuz robbed).


PS: Thanks to Kees for spotting the SQL error in my previous post and providing 
working versions!


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] vtable vs. OR condition

2014-05-25 Thread András Kardos
Hello,

As an expoeriment I'd like to create various virtual tables (vtable) for
SQLite. Like filysystem, process list, mailbox - all local or remote. The
idea is to use SQLite as an unified API and query processor to access or
join data from various sources. I read an old old article in Dr. Dobbs (
http://goo.gl/khg1pN), and it mentions an issue with OR conditions vs.
virtual tables, that is if a query has theese, than the filter conditions
will not be passed to the xBestIndex call, and thus will not be passed to
xFilter, and thus access to the virtual table will always be done as table
scan, and that is quite inefficient. Rewriting queries is not an option as
the article suggest, as that would force users to an unusual SQL style just
to make things faster. Are there any advancements in this topic? Any
experiences with this issue? Or any suggestions for other query processing
engines I might use?

Thanks,
Andris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple Select from IN - from a newbie.

2014-05-25 Thread Humblebee
Thanks for the tip about the online course with an instructor.  I will
look into online courses,  although the 5 digits is a bit outside of
my limited budget.   Learning is generally a very personal thing and
there are many ways to take in information.  I have to say that I
learned so much from the past few days from the amazing help on this
forum and from researching on the internet.  Also since I am only
working with Sqlite and no other databases, this forum has been
nothing less than a gift of grace from above for me.

Big thanks to everyone.


On 5/25/14, Stephen Chrzanowski [via SQLite]
 wrote:
>
>
> @OP: This isn't meant as something to be meant as mean, but I'd recommend
> taking an online course for SQL in general.  There are some free sites,
> there are a lot more paid sites.  I did my SQL training back when SQL 2000
> was just coming out, and I paid 5 digits for the course.  Those five digits
> were to the left of the period, or coma (depending on your language).  The
> course wasn't all about SQL, but it was one of the courses I actually HAD
> to pay attention to.  Most of the rest I had already either known, or
> forgotten.
>
> You will get a LOT more from an online course with teacher led instruction
> rather than a mailing list.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
> below:
> http://sqlite.1065341.n5.nabble.com/Simple-Select-from-IN-from-a-newbie-tp75751p75826.html
>
> To unsubscribe from Simple Select from IN - from a newbie., visit
> http://sqlite.1065341.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=75751&code=ZmFudGFzaWEuZG9zYUBnbWFpbC5jb218NzU3NTF8MTk4Njk4NTgwNw==




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Simple-Select-from-IN-from-a-newbie-tp75751p75827.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple Select from IN - from a newbie.

2014-05-25 Thread Stephen Chrzanowski
@OP: This isn't meant as something to be meant as mean, but I'd recommend
taking an online course for SQL in general.  There are some free sites,
there are a lot more paid sites.  I did my SQL training back when SQL 2000
was just coming out, and I paid 5 digits for the course.  Those five digits
were to the left of the period, or coma (depending on your language).  The
course wasn't all about SQL, but it was one of the courses I actually HAD
to pay attention to.  Most of the rest I had already either known, or
forgotten.

You will get a LOT more from an online course with teacher led instruction
rather than a mailing list.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple Select from IN - from a newbie.

2014-05-25 Thread Kees Nuyt
On Sat, 24 May 2014 22:24:01 +0200, RSmith  wrote:

> INSERT OR UPDATE TeamPersonTable (tpId,teamId,personId,orderId) 
> VALUES (1,3,4,0),(2,3,5,1),(3,7,4,1),(4,7,5,0);

That will raise a syntax error.

Make it:
INSERT OR REPLACE INTO TeamPersonTable
 (tpId,teamId,personId,orderId)
VALUES
 (1,3,4,0),(2,3,5,1),(3,7,4,1),(4,7,5,0);

or even:
REPLACE INTO TeamPersonTable
 (tpId,teamId,personId,orderId)
VALUES
 (1,3,4,0),(2,3,5,1),(3,7,4,1),(4,7,5,0);

http://www.sqlite.org/lang_insert.html

-- 
Groet, Cordialement, Pozdrawiam, Regards,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users