[sqlite] NOT EXISTS and LEFT JOIN Performance problem
>>SELECT * FROM ART WHERE ID IN (SELECT docid FROM OCR WHERE FullText MATCH >>'mailing') Thnak You Clemens that's excellent. KR, Marta -Original Message- From: Clemens Ladisch Sent: Monday, May 25, 2015 9:59 PM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem ShadowMarta at yahoo.de wrote: > ok, understand but I can't see why "NOT IN" would not force a full table > scan but "LEFT JOIN" would ? In a query like "SELECT ... WHERE ID NOT IN (SELECT ...)", the subquery is executed _once_, the results are put into a temporary table/index, and the remaining query can be executed efficiently. (If the subquery were a correlated subquery, it would need to be executed multiple times, and the situation would be different.) > SELECT ART.* FROM ART JOIN (SELECT docid FROM OCR WHERE FullText MATCH > 'mailing') AS ftstable WHERE ART.ID = docid; This is correct, but more complex than needed: SELECT * FROM ART WHERE ID IN (SELECT docid FROM OCR WHERE FullText MATCH 'mailing') There is no need to try to force every query to use joins. Some databases are said to have optimizers that work much better with joins, but this is not true for SQLite. Regards, Clemens ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Allright Simon, finally I have figured it out. SELECT ART.* FROM ART JOIN (SELECT docid FROM OCR WHERE FullText MATCH 'mailing') AS ftstable WHERE ART.ID = docid; Pity that the documentation is so minimal that it causes more confusion as it helps due to by reading it you have the false impression to getting data but all what you get is a few symbols. KR, Marta -Original Message- From: Simon Slavin Sent: Monday, May 25, 2015 3:17 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem On 25 May 2015, at 2:09pm, ShadowMarta at yahoo.de wrote: > What am I missing here ? My guess is that you are expecting FTS tables to act the same as normal tables. They don't. They're specially created and don't have the same indexing and other behaviour. The questions you're asking make sense if you're just talking about normal SQLite tables. It may be that you should split your data up. For most of your relations you should be declaring normal tables. Only the content you need to search quickly needs to be in an FTS table. Simon. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Thanks Simon, >>Only the content you need to search quickly needs to be in an FTS table. I have only one column with the full text OCR extracted from PDF files and it is what need to be searched. I can't go lower than 1 column its obvious. The "MATCH" gives me back the correct docid which corresponds to the ID in an other table (ART.ID). There must be a way to connect this 2 IDs if not with "JOIN" than how ? Can't believe it isn't possible. KR, Marta -Original Message- From: Simon Slavin Sent: Monday, May 25, 2015 3:17 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem On 25 May 2015, at 2:09pm, ShadowMarta at yahoo.de wrote: > What am I missing here ? My guess is that you are expecting FTS tables to act the same as normal tables. They don't. They're specially created and don't have the same indexing and other behaviour. The questions you're asking make sense if you're just talking about normal SQLite tables. It may be that you should split your data up. For most of your relations you should be declaring normal tables. Only the content you need to search quickly needs to be in an FTS table. Simon. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Hi Clemens, ok, understand but I can't see why "NOT IN" would not force a full table scan but "LEFT JOIN" would ? SELECT ART.ID FROM ART LEFT JOIN OCR ON OCR.ID = ART.ID WHERE OCR.ID IS NULL; Is this query on "rowid" or isn't ? Another thing: I have converted my table like you proposed to "CREATE VIRTUAL TABLE "OCR" using fts4("FullText" varchar) and I can insert the rows just fine but can't make a functioning "JOIN" query. I am just getting weird results. "2 Rows returned from: SELECT docid FROM OCR WHERE FullText MATCH 'mailing'; (took 2ms)" the rows 2 & 4. But if I put this statement into a "JOIN" I get: "9 Rows returned from: SELECT * FROM ART INNER JOIN OCR ON ART.ID = (SELECT docid FROM OCR WHERE FullText MATCH 'mailing'); (took 3ms)" IDKundennummerRechnungsnummerRechnungsdatumPDFFullText 2.. . . . 2.. . . . 2. 2. 2. 2. 2. 2. It should give back only two rows 2 & 4 have no other matching ART.ID = docid. And why it is returning "FullText" as well just adds to the mistery. What am I missing here ? KR, Marta -Original Message- From: Clemens Ladisch Sent: Sunday, May 24, 2015 10:11 PM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem ShadowMarta at yahoo.de wrote: > Do you have some explanation of the terrible performace of "NOT EXIST" & > "LEFT JOIN" versus "NOT IN" as well ? FTS tables can do two types of queries efficiently: - lookups by rowid/docid; - searches with MATCH. Anything else (such as your "WHERE id = ?") ends up as a full table scan for each value. Regards, Clemens ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Thank You Clemens, >>INSERT INTO OCR(docid, FullOCR) VALUES (42, 'xxx'); I understand now what you are trying to tell me I have misunderstood the documentation. - or lack of it - Do you have some explanation of the terrible performace of "NOT EXIST" & "LEFT JOIN" versus "NOT IN" as well ? Regardless of you would not use fts in this way. KR, Marta -Original Message- From: Clemens Ladisch Sent: Sunday, May 24, 2015 4:23 PM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem ShadowMarta at yahoo.de wrote: >> You should drop the ID column, and in your queries use the docid instead. > > Not possible. > I fill the rows in a parallel loop, the IDs are in disorder and they are > the link to table ?ART.ID?. INSERT INTO OCR(docid, FullOCR) VALUES (42, 'xxx'); Regards, Clemens ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Hi Clemens, I really appreciate your input but non of your origination have anything to do with the problem at hand. I just want to make sure this thread not get deflected. The problem is ?performance?. 34157ms, 47924ms vs. 103ms >>SQLite ignores pretty much anything except the column names All right then, it should not influence the performance either. >>You should drop the ID column, and in your queries use the docid instead. Not possible. I fill the rows in a parallel loop, the IDs are in disorder and they are the link to table ?ART.ID?. And even if I would do and let say get a better timing the problem(perhaps bug) would still exist. I am not asking for a solution to my problem. I have it already. - By using ?NOT IN? - I am asking for an acknowledgment that it is a bug or an explanation if it is not. KR, Marta -Original Message- From: Clemens Ladisch Sent: Saturday, May 23, 2015 1:29 PM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] NOT EXISTS and LEFT JOIN Performance problem ShadowMarta at yahoo.de wrote: > CREATE VIRTUAL TABLE `OCR` using fts4 ( > `ID`integer primary key NOT NULL, This is not how FTS tables work. SQLite ignores pretty much anything except the column names; it does not matter whether you write PRIMARY KEY or NO KEY PLEASE. All columns get full-text indexed. All FTS tables have the usual internal rowid as primary key; it's also available under the name "docid". You should drop the ID column, and in your queries use the docid instead. Regards, Clemens ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Contentless FTS4 Tables
Hello Dan, sorry it is just like: CREATE VIRTUAL TABLE if not exists OCR USING fts4(content="", `FullOCR`); & CREATE TABLE `ART` ( `ID`integer NOT NULL, `Kundennummer`integer, `Rechnungsnummer`varchar, `Rechnungsdatum`datetime, `PDF`varchar, PRIMARY KEY(ID) ); And the "FullOCR" content is: Row1: "WORD01" "WORD02" . . Row2: "WORD01" "WORD02" "framework" . . I have inserted only 9 rows into "FullOCR" and maybe 4 into "ART". Have one matching ID = docid = 2 to test "JOIN". But you can use just any 2 tables and try to do a "JOIN" or "COUNT" operation. "SELECT COUNT(*) FROM OCR;" produces error so my workaround war using "OCR_docsize" in place of "OCR" "SELECT docid FROM OCR WHERE FullOCR MATCH 'framework';" gives correctly one row with the docid = 2. but when I try to use this in a "JOIN" statement produces error and no workaround with "OCR_docsize" gives correct result. BR, Marta -Original Message- From: Dan Kennedy Sent: Saturday, May 23, 2015 9:34 AM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] Contentless FTS4 Tables On 05/23/2015 04:33 AM, ShadowMarta at yahoo.de wrote: > Hi! > > I have made a Contentless FTS4 Table like: > > "CREATE VIRTUAL TABLE if not exists OCR USING fts4(content="", `FullOCR` > varchar;" > > And managed to insert some data into it. > > I have 2 questions: > > 1.) How to get the proper COUNT on the table ? > The only query seems to work is: > "SELECT COUNT(*) FROM OCR_docsize;" is this the right way to do it > ? > > 2.) How to perform a "JOIN" operation with it? > "SELECT docid FROM OCR WHERE FullOCR MATCH 'framework';" result is > "2". > > "SELECT * FROM ART INNER JOIN OCR ON ART.ID = (SELECT docid FROM OCR > WHERE FullOCR MATCH 'framework') ORDER BY ID;" > Gives "SQL logic error or missing database:" > > "SELECT * FROM ART INNER JOIN OCR_docsize ON ART.ID = (SELECT docid > FROM OCR WHERE FullOCR MATCH 'framework') ORDER BY ID;" > Gives me 9 results back - should be only 1 - > > "SELECT * FROM ART NATURAL JOIN (SELECT docid FROM OCR WHERE FullOCR > MATCH 'framework') ORDER BY ID;" > Gives me 9 results back - should be only 1 - > > What I am missing here? Are you able to make the database file available for download somewhere? Thanks, Dan. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] NOT EXISTS and LEFT JOIN Performance problem
Hello! Sorry if this report shows up as a duplicate but didn?t figure out jet why some of my emails not showing up or getting rejected. This is my first "bug" report here so please bear with me for blunders. Using: ?sqlite-amalgamation-3081002.zip? Build as: ?cl sqlite3.c -O2 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_API=__declspec(dllexport) -link -dll -SUBSYSTEM:WINDOWS,"5.01" -out:sqlite3.dll? I have 2 Tables: CREATE TABLE `ART` ( `ID`integer NOT NULL, `Kundennummer`integer, `Rechnungsnummer`varchar, `Rechnungsdatum`datetime, `PDF`varchar, PRIMARY KEY(ID) ); and CREATE VIRTUAL TABLE `OCR` using fts4 ( `ID`integer primary key NOT NULL, `FullOCR` varchar ); "PRAGMA journal_mode = OFF" Both table has ~10.000 records but for testing more than 1000 is not recommended - one query can take up to 4 hours - ?FullOCR? has list of words in rows averaging 700/words * ~8 chars per row. The problem: Query (for 1000 records) 498 Rows returned from: SELECT ID FROM ART WHERE NOT EXISTS (SELECT ID FROM OCR WHERE OCR.ID = ART.ID); (took 34157ms) (For 10.000 records it takes ~ 3.5 hours.) - Result seems to be correct. - 498 Rows returned from: SELECT ART.ID FROM ART LEFT JOIN OCR ON OCR.ID = ART.ID WHERE OCR.ID IS NULL; (took 47924ms) (For 10.000 records it takes ~ 4 hours.) - Result seems to be correct. - 498 Rows returned from: SELECT ID FROM ART WHERE ID NOT IN (SELECT ID FROM OCR); (took 103ms) (For 10.000 records it takes 1759ms) Result seems to be correct as well. I am not pretending to be an expert but it looks like that some serious optimization flub is going on with ?NOT EXISTS? and ?LEFT JOIN?, the timings are "horrific". ?NOT IN? looks just fine. On MySQL all 3 queries timing is nearly identical ~1-2 seconds for 10.000 records. MR, Marta
[sqlite] Contentless FTS4 Tables
Hi! I have made a Contentless FTS4 Table like: "CREATE VIRTUAL TABLE if not exists OCR USING fts4(content="", `FullOCR` varchar;" And managed to insert some data into it. I have 2 questions: 1.) How to get the proper COUNT on the table ? The only query seems to work is: "SELECT COUNT(*) FROM OCR_docsize;" is this the right way to do it ? 2.) How to perform a "JOIN" operation with it? "SELECT docid FROM OCR WHERE FullOCR MATCH 'framework';" result is "2". "SELECT * FROM ART INNER JOIN OCR ON ART.ID = (SELECT docid FROM OCR WHERE FullOCR MATCH 'framework') ORDER BY ID;" Gives "SQL logic error or missing database:" "SELECT * FROM ART INNER JOIN OCR_docsize ON ART.ID = (SELECT docid FROM OCR WHERE FullOCR MATCH 'framework') ORDER BY ID;" Gives me 9 results back - should be only 1 - "SELECT * FROM ART NATURAL JOIN (SELECT docid FROM OCR WHERE FullOCR MATCH 'framework') ORDER BY ID;" Gives me 9 results back - should be only 1 - What I am missing here? Any idea? Thanks, Marta