[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-26 Thread shadowma...@yahoo.de
>>SELECT * FROM ART WHERE ID IN (SELECT docid FROM OCR WHERE FullText MATCH 
>>'mailing')

Thnak You Clemens that's excellent.

KR,
Marta

-Original Message- 
From: Clemens Ladisch
Sent: Monday, May 25, 2015 9:59 PM
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem

ShadowMarta at yahoo.de wrote:
> ok, understand but I can't see why "NOT IN" would not force a full table 
> scan but "LEFT JOIN" would ?

In a query like "SELECT ... WHERE ID NOT IN (SELECT ...)", the subquery is 
executed
_once_, the results are put into a temporary table/index, and the remaining 
query
can be executed efficiently.

(If the subquery were a correlated subquery, it would need to be executed 
multiple
times, and the situation would be different.)

> SELECT ART.* FROM ART JOIN (SELECT docid FROM OCR WHERE FullText MATCH 
> 'mailing') AS ftstable WHERE ART.ID = docid;

This is correct, but more complex than needed:

  SELECT * FROM ART WHERE ID IN (SELECT docid FROM OCR WHERE FullText MATCH 
'mailing')

There is no need to try to force every query to use joins.  Some databases 
are
said to have optimizers that work much better with joins, but this is not 
true
for SQLite.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-25 Thread Clemens Ladisch
ShadowMarta at yahoo.de wrote:
> ok, understand but I can't see why "NOT IN" would not force a full table scan 
> but "LEFT JOIN" would ?

In a query like "SELECT ... WHERE ID NOT IN (SELECT ...)", the subquery is 
executed
_once_, the results are put into a temporary table/index, and the remaining 
query
can be executed efficiently.

(If the subquery were a correlated subquery, it would need to be executed 
multiple
times, and the situation would be different.)

> SELECT ART.* FROM ART JOIN (SELECT docid FROM OCR WHERE FullText MATCH 
> 'mailing') AS ftstable WHERE ART.ID = docid;

This is correct, but more complex than needed:

  SELECT * FROM ART WHERE ID IN (SELECT docid FROM OCR WHERE FullText MATCH 
'mailing')

There is no need to try to force every query to use joins.  Some databases are
said to have optimizers that work much better with joins, but this is not true
for SQLite.


Regards,
Clemens


[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-25 Thread shadowma...@yahoo.de
Allright Simon,

finally I have figured it out.

SELECT ART.* FROM ART JOIN (SELECT docid FROM OCR WHERE FullText MATCH 
'mailing') AS ftstable WHERE ART.ID = docid;

Pity that the documentation is so minimal that it causes more confusion as 
it helps due to by reading it you have the false impression to getting data 
but all what you get is a few symbols.

KR,
Marta

-Original Message- 
From: Simon Slavin
Sent: Monday, May 25, 2015 3:17 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem


On 25 May 2015, at 2:09pm, ShadowMarta at yahoo.de wrote:

> What am I missing here ?

My guess is that you are expecting FTS tables to act the same as normal 
tables. They don't.  They're specially created and don't have the same 
indexing and other behaviour.  The questions you're asking make sense if 
you're just talking about normal SQLite tables.

It may be that you should split your data up.  For most of your relations 
you should be declaring normal tables.  Only the content you need to search 
quickly needs to be in an FTS table.

Simon.
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-25 Thread shadowma...@yahoo.de
Thanks Simon,

>>Only the content you need to search quickly needs to be in an FTS table.

I have only one column with the full text OCR extracted from PDF files and 
it is what need to be searched.
I can't go lower than 1 column its obvious.

The "MATCH" gives me back the correct docid which corresponds to the ID in 
an other table (ART.ID).

There must be a way to connect this 2 IDs if not with "JOIN" than how ?
Can't believe it isn't possible.

KR,
Marta

-Original Message- 
From: Simon Slavin
Sent: Monday, May 25, 2015 3:17 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem


On 25 May 2015, at 2:09pm, ShadowMarta at yahoo.de wrote:

> What am I missing here ?

My guess is that you are expecting FTS tables to act the same as normal 
tables. They don't.  They're specially created and don't have the same 
indexing and other behaviour.  The questions you're asking make sense if 
you're just talking about normal SQLite tables.

It may be that you should split your data up.  For most of your relations 
you should be declaring normal tables.  Only the content you need to search 
quickly needs to be in an FTS table.

Simon.
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-25 Thread shadowma...@yahoo.de
Hi Clemens,

ok, understand but I can't see why "NOT IN" would not force a full table 
scan but "LEFT JOIN" would ?

SELECT ART.ID FROM ART LEFT JOIN OCR ON OCR.ID = ART.ID WHERE OCR.ID IS 
NULL;
Is this query on "rowid" or isn't ?

Another thing:

I have converted my table like you proposed to "CREATE VIRTUAL TABLE "OCR" 
using fts4("FullText" varchar) and I can insert the rows just fine but can't 
make a functioning "JOIN" query.
I am just getting weird results.

"2 Rows returned from: SELECT docid FROM OCR WHERE FullText MATCH 'mailing'; 
(took 2ms)"
the rows 2 & 4.

But if I put this statement into a "JOIN" I get:
"9 Rows returned from: SELECT * FROM ART INNER JOIN OCR ON ART.ID = (SELECT 
docid FROM OCR WHERE FullText MATCH 'mailing'); (took 3ms)"

IDKundennummerRechnungsnummerRechnungsdatumPDFFullText
2.. 
.  .   .
2.. 
.  .   .
2.
2.
2.
2.
2.
2.

It should give back only two rows 2 & 4 have no other matching ART.ID = 
docid.
And why it is returning "FullText" as well  just adds to the mistery.

What am I missing here ?

KR,
Marta
-----Original Message----- 
From: Clemens Ladisch
Sent: Sunday, May 24, 2015 10:11 PM
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem

ShadowMarta at yahoo.de wrote:
> Do you have some explanation of the terrible performace of "NOT EXIST" & 
> "LEFT JOIN" versus "NOT IN" as well ?

FTS tables can do two types of queries efficiently:
- lookups by rowid/docid;
- searches with MATCH.

Anything else (such as your "WHERE id = ?") ends up as a full table scan for 
each value.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-25 Thread Simon Slavin

On 25 May 2015, at 2:09pm, ShadowMarta at yahoo.de wrote:

> What am I missing here ?

My guess is that you are expecting FTS tables to act the same as normal tables. 
They don't.  They're specially created and don't have the same indexing and 
other behaviour.  The questions you're asking make sense if you're just talking 
about normal SQLite tables.

It may be that you should split your data up.  For most of your relations you 
should be declaring normal tables.  Only the content you need to search quickly 
needs to be in an FTS table.

Simon.


[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-24 Thread Clemens Ladisch
ShadowMarta at yahoo.de wrote:
> Do you have some explanation of the terrible performace of "NOT EXIST" & 
> "LEFT JOIN" versus "NOT IN" as well ?

FTS tables can do two types of queries efficiently:
- lookups by rowid/docid;
- searches with MATCH.

Anything else (such as your "WHERE id = ?") ends up as a full table scan for 
each value.


Regards,
Clemens


[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-24 Thread shadowma...@yahoo.de
Thank You Clemens,

>>INSERT INTO OCR(docid, FullOCR) VALUES (42, 'xxx');
I understand now what you are trying to tell me I have misunderstood the 
documentation. - or lack of it -

Do you have some explanation of the terrible performace of "NOT EXIST" & 
"LEFT JOIN" versus "NOT IN" as well ?
Regardless of you would not use fts in this way.

KR,
Marta


-Original Message- 
From: Clemens Ladisch
Sent: Sunday, May 24, 2015 4:23 PM
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem

ShadowMarta at yahoo.de wrote:
>> You should drop the ID column, and in your queries use the docid instead.
>
> Not possible.
> I fill the rows in a parallel loop, the IDs are in disorder and they are 
> the link to table ?ART.ID?.

INSERT INTO OCR(docid, FullOCR) VALUES (42, 'xxx');


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-24 Thread Clemens Ladisch
ShadowMarta at yahoo.de wrote:
>> You should drop the ID column, and in your queries use the docid instead.
>
> Not possible.
> I fill the rows in a parallel loop, the IDs are in disorder and they are the 
> link to table ?ART.ID?.

INSERT INTO OCR(docid, FullOCR) VALUES (42, 'xxx');


Regards,
Clemens


[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-24 Thread shadowma...@yahoo.de
Hi Clemens,

I really appreciate your input but non of your origination have anything to 
do with the problem at hand.
I just want to make sure this thread not get deflected.

The problem is ?performance?.

34157ms, 47924ms vs. 103ms

>>SQLite ignores pretty much anything except the column names
All right then, it should not influence the performance either.

>>You should drop the ID column, and in your queries use the docid instead.
Not possible.
I fill the rows in a parallel loop, the IDs are in disorder and they are the 
link to table ?ART.ID?.

And even if I would do and let say get a better timing the problem(perhaps 
bug) would still exist.
I am not asking for a solution to my problem. I have it already. - By using 
?NOT IN? -
I am asking for an acknowledgment that it is a bug or an explanation if it 
is not.

KR,
Marta

-Original Message- 
From: Clemens Ladisch
Sent: Saturday, May 23, 2015 1:29 PM
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem

ShadowMarta at yahoo.de wrote:
> CREATE VIRTUAL TABLE `OCR` using fts4 (
> `ID`integer primary key NOT NULL,

This is not how FTS tables work.  SQLite ignores pretty much anything
except the column names; it does not matter whether you write PRIMARY
KEY or NO KEY PLEASE.  All columns get full-text indexed.

All FTS tables have the usual internal rowid as primary key; it's also
available under the name "docid".

You should drop the ID column, and in your queries use the docid instead.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 



[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-23 Thread Clemens Ladisch
ShadowMarta at yahoo.de wrote:
> CREATE VIRTUAL TABLE `OCR` using fts4 (
> `ID`integer primary key NOT NULL,

This is not how FTS tables work.  SQLite ignores pretty much anything
except the column names; it does not matter whether you write PRIMARY
KEY or NO KEY PLEASE.  All columns get full-text indexed.

All FTS tables have the usual internal rowid as primary key; it's also
available under the name "docid".

You should drop the ID column, and in your queries use the docid instead.


Regards,
Clemens


[sqlite] NOT EXISTS and LEFT JOIN Performance problem

2015-05-23 Thread shadowma...@yahoo.de
Hello!

Sorry if this report shows up as a duplicate but didn?t figure out jet why 
some of my emails not showing up or getting rejected.



This is my first "bug" report here so please bear with me for blunders.

Using:
?sqlite-amalgamation-3081002.zip?

Build as:
?cl 
sqlite3.c -O2 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 
-DSQLITE_API=__declspec(dllexport) 
 -link -dll -SUBSYSTEM:WINDOWS,"5.01" -out:sqlite3.dll?


I have 2 Tables:

CREATE TABLE `ART` (
`ID`integer NOT NULL,
`Kundennummer`integer,
`Rechnungsnummer`varchar,
`Rechnungsdatum`datetime,
`PDF`varchar,
PRIMARY KEY(ID)
);

and

CREATE VIRTUAL TABLE `OCR` using fts4 (
 `ID`integer primary key NOT NULL,
 `FullOCR` varchar
);

"PRAGMA journal_mode = OFF"

Both table has ~10.000 records but for testing more than 1000 is not 
recommended  - one query can take up to 4 hours -

?FullOCR? has list of words in rows averaging 700/words * ~8 chars per row.

The problem:

Query (for 1000 records)

498 Rows returned from: SELECT ID FROM ART WHERE NOT EXISTS (SELECT ID FROM 
OCR WHERE OCR.ID = ART.ID); (took 34157ms)
(For 10.000 records it takes ~ 3.5 hours.)
- Result seems to be correct. -

498 Rows returned from: SELECT ART.ID FROM ART LEFT JOIN OCR ON OCR.ID = 
ART.ID WHERE OCR.ID IS NULL; (took 47924ms)
(For 10.000 records it takes ~ 4 hours.)
- Result seems to be correct. -

498 Rows returned from: SELECT ID FROM ART WHERE ID NOT IN (SELECT ID FROM 
OCR); (took 103ms)
(For 10.000 records it takes 1759ms)
Result seems to be correct as well.

I am not pretending to be an expert but it looks like that some serious 
optimization flub is going on
with ?NOT EXISTS? and ?LEFT JOIN?, the timings are "horrific".

?NOT IN? looks just fine.

On MySQL all 3 queries timing is nearly identical ~1-2 seconds for 10.000 
records.

MR,
Marta