Re: [sqlite] small sqlite fts snippet function or Fts Bug!
On 01/27/2015 06:48 PM, boscowitch wrote: and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25) I get following for a select with snippet: EXAMPLE OUTPUT: sqlite select docid,*,snippet(test) from test where german match a; 1|[1] a b c|1] ba/b b c 2|[{[_.,:;[1] a b c|1] ba/b b c 3|1[1] a b c|1[1] ba/b b c 4|[1] a b c|1] a bb/b c 5|[1] a b c|[1] ba/b b c -As you can see for id 1 and 2 b is at the right position but all beginning non-alphanumerical [,{, etc. are just left out in the snippet. -ID 4 does not help and breaks the offsets so even worse Thanks for reporting this. The issue with (1) and (2) is now fixed here: http://www.sqlite.org/src/info/adc9283dd9b I think it is a bug in the input data causing the problem in (4). The values inserted into test and testdata are just slightly different. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] small sqlite fts snippet function or Fts Bug!
Yeah, -ID 4 was just a desperate experiment for a hack with longer data in the search to see if it would lead the snippet function to start grabbing the data from the start (or at least one word/char more). The offsets beeing wrong and therefore the b... was kinda expected of me, but in case it worked I would have manually substracted the offset and put the markers in myself... so it wasn't part of the bug report just a test how it behaves in that case. Good to know for the future that its already fixed, thx for taking care of it so fast! boscowitch Am Mittwoch, den 28.01.2015, 02:04 +0700 schrieb Dan Kennedy: On 01/27/2015 06:48 PM, boscowitch wrote: and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25) I get following for a select with snippet: EXAMPLE OUTPUT: sqlite select docid,*,snippet(test) from test where german match a; 1|[1] a b c|1] ba/b b c 2|[{[_.,:;[1] a b c|1] ba/b b c 3|1[1] a b c|1[1] ba/b b c 4|[1] a b c|1] a bb/b c 5|[1] a b c|[1] ba/b b c -As you can see for id 1 and 2 b is at the right position but all beginning non-alphanumerical [,{, etc. are just left out in the snippet. -ID 4 does not help and breaks the offsets so even worse Thanks for reporting this. The issue with (1) and (2) is now fixed here: http://www.sqlite.org/src/info/adc9283dd9b I think it is a bug in the input data causing the problem in (4). The values inserted into test and testdata are just slightly different. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] small sqlite fts snippet function or Fts Bug!
Hello since it this bug report (+ a dirty-fix) it might be useful for both users and devs. that's why I send a copy to both mailing lists! I hope I don't bother the diligent devs who read all of both list, sry to them, and thx for sqlite btw. ;)! recently I wanted to use the snippet function in sqlite for my small sqlite dictionary (running on android but the bug occurs also on my linux desktop). but it behaved strangely when my entry started with non-words character(s) (not alphanumeric and all Unicode (or chars128) in short simple tokenizer delimiters) the snippet never prints them if they are in the beginning of the first word here an examples to demonstrate: EXAMPLE SETUP SQL: create table testdata (german); create virtual table test using fts4(content=testdata,german); insert into testdata(german) VALUES ([1] a b c); insert into test(docid,german) VALUES(1,[1] a b c ); insert into testdata(german) VALUES ([{[_.,:;[1] a b c); insert into test(docid,german) VALUES(2,[{[_.,:;[1] a b c ); insert into testdata(german) VALUES (1[1] a b c); insert into test(docid,german) VALUES(3,1[1] a b c ); insert into testdata(german) VALUES ([1] a b c); insert into test(docid,german) VALUES(4,1[1] a b c ); insert into testdata(german) VALUES(char(8203,91,49,93,32,97,32,98,32,99)); insert into test(docid,german) VALUES(5,char(8203,91,49,93,32,97,32,98,32,99)); and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25) I get following for a select with snippet: EXAMPLE OUTPUT: sqlite select docid,*,snippet(test) from test where german match a; 1|[1] a b c|1] ba/b b c 2|[{[_.,:;[1] a b c|1] ba/b b c 3|1[1] a b c|1[1] ba/b b c 4|[1] a b c|1] a bb/b c 5|[1] a b c|[1] ba/b b c -As you can see for id 1 and 2 b is at the right position but all beginning non-alphanumerical [,{, etc. are just left out in the snippet. -ID 3 works but has an additional 1 that should not be there so no solution... -ID 4 does not help and breaks the offsets so even worse -ID 5 works BUT this is a dirty fix i found. it adds an Unicode character ('ZERO WIDTH SPACE' (U+200B)) in front which obviously cant be seen and doesnt break the offsets (just shifts them all +1) I didn't test it yet on android but I hope so, since it supports Unicode ... obviously this is not a nice solution or one for more simpler/embedded systems. (btw. the same bug occurs also with fts3 and also with no special content option) here a small example for normal fts4 with a more custom snippet call: create virtual table test using fts4(german); insert into test VALUES([1] a b c); sqlite select *,snippet(test,#,#,...,0,64) from test where german match 'a'; [1] a b c|1] #a# b c regards boscowitch PS: please excuse the german ;) and all English spelling errors ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
Just to be clear. It basically means that after MATCH records are returned it iterates through ALL the rowids of the returned set and removes them from the set and orders them accordingly. -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78849.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
supermariobros wrote: Well, they all give exactly the same output. sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' ORDER BY rowid ASC LIMIT 100; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows) sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' AND rowid1000 ORDER BY rowid ASC LIMIT 10; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows) sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' LIMIT 100; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows) It almost looks like EXPLAIN ignores the second part where rowid is compared or sorted. It doesn't change the way in which the database accesses the table, which implies that the rowid comparison is not using any index, i.e., the FTS module first computes the results, and then the rows with small rowid values are thrown away. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
Well, they all give exactly the same output. sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' ORDER BY rowid ASC LIMIT 100; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows) sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' AND rowid1000 ORDER BY rowid ASC LIMIT 10; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows) sqlite EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' LIMIT 100; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows) It almost looks like EXPLAIN ignores the second part where rowid is compared or sorted. how should I understand that? -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78831.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
Hi Thanks For quick response Of course you are right that I can not use row id in the way I used it above. I guess I wrote it quicker than I thought about it. However If I use original rowid and LIMIT it should be fine, knowing that the submited rowid is the rowid of the last element of the previous set. Like this: SELECT rowid FROM text_content WHERE text_content MATCH 'x' AND rowid 1000 ORDER BY rowid LIMIT 10; of course it only works when last rowid of the previous set is available so I can not just skip to the N-th page, wchich is fine in my case. I just have to know first and last element to go back and forward; This is what explain query shows: EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' AND rowid1000 ORDER BY rowid ASC LIMIT 10; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows) If I understand it correctly it uses indexes properly on FTS but I do not know how the row id scanning and sorting is done. If you have some time could you shed some light on this. Also when I compare time of the querry of the one with rowid to the one with OFFSET query on 5000 records I do not see any difference in execution time. I read just like you wrote that OFFSET starts to be time consuming when number is very high but what is the high number case. I doubt I will ever deal with more than 1 records in this entire virtual table so should I even be concerned? Thanks -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78772.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
supermariobros wrote: EXPLAIN QUERY PLAN SELECT rowid FROM activity_text_content WHERE activity_text_content MATCH 'x' AND rowid1000 ORDER BY rowid ASC LIMIT 10; 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows) If I understand it correctly it uses indexes properly on FTS but I do not know how the row id scanning and sorting is done. Compare the EXPLAIN QUERY PLAN output of this query and of the same query without the rowid comparison. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
Or maybe, if I am using android, it should be done at the cursor level? -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78755.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS pagination
Quick question. If I am using FTQ that looks like this SELECT * FROM mail WHERE body MATCH 'sqlite' can I add to it WHERE rowid 5 AND rwoid 10 or it will significantly slow it down. If so what would be the best approach for pagination, For example if I get 500 rows with the matching term and obviously I do not need them all right away. Should I just use the first option and maybe select only rowids and then go through them? Any suggestions? Thanks -- View this message in context: http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS pagination
supermariobros wrote: If I am using FTQ that looks like this SELECT * FROM mail WHERE body MATCH 'sqlite' can I add to it WHERE rowid 5 AND rwoid 10 or it will significantly slow it down. How much did it slow down when you tested it? Anyway, without index: sqlite EXPLAIN QUERY PLAN SELECT * FROM t; 0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 0: with word search: sqlite EXPLAIN QUERY PLAN SELECT * FROM t WHERE t MATCH 'x'; 0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 3: your query: sqlite EXPLAIN QUERY PLAN SELECT * FROM t WHERE t MATCH 'x' AND rowid BETWEEN 5 AND 10; 0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 393219: So it uses some index (and the lowest two bits are still set, so it still uses the FTS index, but this is an implementation detail.) This might be different with a different SQLite version. If so what would be the best approach for pagination The rowid cannot be used for pagination because you get the numbers of the original rows. You would have to use OFFSET/LIMIT, which is inefficient for large offsets. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Milan Kříž wrote: 3) A query which should use a linear scan according to the SQLite documentation (http://www.sqlite.org/fts3.html#section_1_4) SELECT docId FROM ftsTable WHERE docId BETWEEN 20 AND 23 - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393216: - the documentation does not say a word about another indexes on an FTS table, so where is the index 393216 come from? This is an undocumented optimization. In recent versions, FTS also optimizes docid searches with less/greater than operators. (The index number is an implementation detail.) 4) The I have a query with both 'match ?' sub-clause and 'rowid=?' sub-clause. It is not clear to me which variant will be used. But according to definition of Full-text query it should use full-text query at first. And then? Will it use index to rowid after full-text query is performed? SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) - anyway from the query plan it seems that no full-text query is performed at all - or how to interpret it?: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 INDEX 1 is the full-text search. The rowid values of the returned rows are then compared (by SQLite, outside of FTS) against the list. Could you please give me a clue how to guess whether a complex FTS query will use a full-text index and which one it will use? There is only one full-text index per table. The FTS module implements a search/lookup iff the EXPLAIN QUERY PLAN output shows VIRTUAL TABLE INDEX. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Clemens Ladisch wrote: 4) The I have a query with both 'match ?' sub-clause and 'rowid=?' sub-clause. It is not clear to me which variant will be used. But according to definition of Full-text query it should use full-text query at first. And then? Will it use index to rowid after full-text query is performed? SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) - anyway from the query plan it seems that no full-text query is performed at all - or how to interpret it?: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 INDEX 1 is the full-text search. The rowid values of the returned rows are then compared (by SQLite, outside of FTS) against the list. Ok, it would be what I expect. But according to my first two queries 1) and 2), it looks like a full-text index is the *index 18*. 1) A*full-text query* SELECT docId FROM ftsTableWHERE ftsTable MATCH 'a*' - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE*INDEX**18*: 2) A*query by rowid* SELECT docId FROM ftsTable WHERE docid = 10 - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE*INDEX 1*: So maybe the query plan shows a wrong number? Or is the index number unrelated to a full-text index? I thought that INDEX 1 is an index to rowid, but maybe index numbers are somehow fuzzy? :-). Thanks, Milan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Milan Kříž wrote: Clemens Ladisch wrote: INDEX 1 is the full-text search. Sorry, that's wrong. The idxNum value is determined as follows: (see fts3Int.h) /* ** The Fts3Cursor.eSearch member is always set to one of the following. ** Actualy, Fts3Cursor.eSearch can be greater than or equal to ** FTS3_FULLTEXT_SEARCH. If so, then Fts3Cursor.eSearch - 2 is the index ** of the column to be searched. For example, in ** ** CREATE VIRTUAL TABLE ex1 USING fts3(a,b,c,d); ** SELECT docid FROM ex1 WHERE b MATCH 'one two three'; ** ** Because the LHS of the MATCH operator is 2nd column b, ** Fts3Cursor.eSearch will be set to FTS3_FULLTEXT_SEARCH+1. (+0 for a, ** +1 for b, +2 for c, +3 for d.) If the LHS of MATCH were ex1 ** indicating that all columns should be searched, ** then eSearch would be set to FTS3_FULLTEXT_SEARCH+4. */ #define FTS3_FULLSCAN_SEARCH 0/* Linear scan of %_content table */ #define FTS3_DOCID_SEARCH1/* Lookup by rowid on %_content table */ #define FTS3_FULLTEXT_SEARCH 2/* Full-text index search */ /* ** The lower 16-bits of the sqlite3_index_info.idxNum value set by ** the xBestIndex() method contains the Fts3Cursor.eSearch value described ** above. The upper 16-bits contain a combination of the following ** bits, used to describe extra constraints on full-text searches. */ #define FTS3_HAVE_LANGID0x0001 /* languageid=? */ #define FTS3_HAVE_DOCID_GE 0x0002 /* docid=? */ #define FTS3_HAVE_DOCID_LE 0x0004 /* docid=? */ Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Clemens Ladisch wrote: Milan Kříž wrote: Clemens Ladisch wrote: INDEX 1 is the full-text search. Sorry, that's wrong. So does it mean that the full-text search is not performed for the following query at all? And that only the docId index is used to get entries in the IN sub-clause and then a linear scan with a comparison to 'a*' is done? query: SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) query plan: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 Is it possible to force SQLite to use the full-text search instead of the rowid search? I can think only about something like this: select docId from ( select docId from ftsTable where ftsTable match 'a*' ) where docId in (21, 22, 23, 24) query plan: SCAN TABLE nameFtsTable VIRTUAL TABLE INDEX 18: EXECUTE LIST SUBQUERY 1 It looks much better. But what does the 'EXECUTE LIST SUBQUERY 1' mean? You wrote that returned values are compared by SQLite (outside of FTS). But does it use some index (rowid index) or is it impossible for SQLite to use an index on the same table (even if the first one - full-text index - was used in a subquery)? Of course, in my real use-case I have much more complex docId condition and I have a lot of entries in an FTS table (about million entries), so I would like the full-text search to prune the results first and then filter results using docId. FTS condition is also much more complex than just 'a*' so I expect that many results will be filtered out by the full-text query. Thanks, Milan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Milan Kříž wrote: So does it mean that the full-text search is not performed for the following query at all? SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 No, it means that you are using a different version. (In any case, it is not possible to execute MATCH without the FTS index.) what does the 'EXECUTE LIST SUBQUERY 1' mean? It's how the rowid IN (...) is implemented. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Dne 28.8.2014 14:54, Clemens Ladisch napsal(a): Milan Kříž wrote: So does it mean that the full-text search is not performed for the following query at all? SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 No, it means that you are using a different version. (In any case, it is not possible to execute MATCH without the FTS index.) Thanks and sorry for bothering you with such details. But I still cannot understand that query plan. Since for a simple rowid query it says: explain query plan select * from ftsTable where docId = 100 SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: It is clear to me that index 1 is definitely a rowId index. Then for complex query above, it says: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 So where is a mention about using an FTS index (full-text query)? Or is it just an imperfection of 'explain query plan' that the usage of the full-text index is missing from the query plan? I use SQLite version 3.8.5 and for full table full-text search it gives a correct index according to comment in the fts3Int.h: FTS3_FULLTEXT_SEARCH+${NUMBER_OF_COLUMNS} = VIRTUAL TABLE INDEX 18 in my ftsTable Regards, Milan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Milan Kříž wrote: Dne 28.8.2014 14:54, Clemens Ladisch napsal(a): Milan Kříž wrote: So does it mean that the full-text search is not performed for the following query at all? No, it means that you are using a different version. But I still cannot understand that query plan. Then try with 3.8.6. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS full-text query vs. query by rowid
Dne 28.8.2014 16:48, Clemens Ladisch napsal(a): Then try with 3.8.6. Ouuu . .sorry again. I have tested it with 3.8.6 and the query plan looks ok now. SCAN TABLE nameftsTable VIRTUAL TABLE INDEX 18: EXECUTE LIST SUBQUERY 1 But I also tested it with my version again and I'm getting the same (correct) query plan now. I cannot understand how it is possible :-). The only thing I changed is that in the meantime I've recreated my ftsTable. So it was either in some strange state I cannot reproduce now or I am simply overworked :-). Or maybe I accidentally used '=' or 'is' instead of 'match'. However, many thanks for your support, now it all seems much more clear. Milan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS full-text query vs. query by rowid
Hello, I would like to ask several questions regarding to SQLite FTS module and how it uses indexes. I have following queries: 1) A full-text query SELECT docId FROM ftsTableWHERE ftsTable MATCH 'a*' - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 18: 2) A query by rowid SELECT docId FROM ftsTable WHERE docid = 10 - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: So far this is as expected. 3) A query which should use a linear scan according to the SQLite documentation (http://www.sqlite.org/fts3.html#section_1_4) SELECT docId FROM ftsTable WHERE docId BETWEEN 20 AND 23 - gets a following query plan: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393216: - the documentation does not say a word about another indexes on an FTS table, so where is the index 393216 come from? 4) The I have a query with both 'match ?' sub-clause and 'rowid=?' sub-clause. It is not clear to me which variant will be used. But according to definition of Full-text query it should use full-text query at first. And then? Will it use index to rowid after full-text query is performed? SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 23) - anyway from the query plan it seems that no full-text query is performed at all - or how to interpret it?: SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1: EXECUTE LIST SUBQUERY 1 5) When between is used instead of in, SQLite should use a linear scan, but query plan says something else: select docId from ftsTable where ftsTable match 'a*' and docId between 20 and 23 - query plan SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393234: - even another index (393234) seems to be used but again, where does it come from? Is there any undocumented optimizations in the FTS module or did I miss some note in the SQLite documentation? Could you please give me a clue how to guess whether a complex FTS query will use a full-text index and which one it will use? And could you explain what the different numbers of indexes means? Are all indexes really created in a DB? Thanks for you explanation, Milan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS module and DB close
Ok, Maybe the solution is: 1) try to close the connection: sqlite3_close 2) if error code is SQLITE_BUSY, a) use sqlite3_next_stmt to finalize dangling statements b) retry to close the connection Step (1) ensures that FTS related statements are finalized. Thanks. On Sat, Jun 7, 2014 at 7:49 PM, gwenn gwenn.k...@gmail.com wrote: Hello, How do you prevent double free/finalize of statements created by the FTS module ? I am using sqlite3_next_stmt to finalize all dangling statements before closing the connection but the program crashes because the FTS module finalizes them too when sqlite3_close is called... May be I should use sqlite3_close_v2 (not available on MacOS X: SQLite version 3.7.13) ? Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS module and DB close
Hello, How do you prevent double free/finalize of statements created by the FTS module ? I am using sqlite3_next_stmt to finalize all dangling statements before closing the connection but the program crashes because the FTS module finalizes them too when sqlite3_close is called... May be I should use sqlite3_close_v2 (not available on MacOS X: SQLite version 3.7.13) ? Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS performance (mobile devices)
be used for small changes only. In case of bulk actions, the triggers should be dropped and the FTS index rebuilt after the action completes. 2. Rebuilding FTS index takes a lot of memory, whereby the amount does not depend on the table size. Question: Would it be possible to reduce this number? Apparently FTS index can be built with less memory. 3. Don't use FTS index optimization. Small gains in terms of speed, risk of high memory consumption. 4. There is some problem in SQLite FTS code as the described crash proves. (Might relate to empty index only) 5. DB size: This is not described above, but it is important to realize, that SQLite DB does not shrink. FTS index takes compareble space to the data being indexed and it will be stored first in the WAL log, then in the DB itself. In the worst case the DB size may blow up 2-3x. Solution for the WAL log: PRAGMA journal_size_limit=10485760 (fast and easy) Solution for the DB: VACUUM (slow, that's way difficult) Any questions/comments are welcome. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Find Tokens at Record Start
Paul Vercellotti wrote: using FTS, how do you match records that contain certain tokens beginning at the start of the record Apparently, this information is not stored in the FTS index. Search for the tokens, then manually check with LIKE or something like that. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Find Tokens at Record Start
You might want to check the following: SELECT word FROM fts WHERE fts MATCH '^token' Beginning with 3.7.9 this should only return records that have 'token' at the beginning of the record. See changelog of 3.7.9: If a search token (on the right-hand side of the MATCH operator) in FTS4 begins with ^ then that token must be the first in its field of the document. ** Potentially Incompatible Change ** -Ursprüngliche Nachricht- Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] Im Auftrag von Clemens Ladisch Gesendet: Freitag, 5. April 2013 15:41 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] FTS Find Tokens at Record Start Paul Vercellotti wrote: using FTS, how do you match records that contain certain tokens beginning at the start of the record Apparently, this information is not stored in the FTS index. Search for the tokens, then manually check with LIKE or something like that. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS Find Tokens at Record Start
Hi there, I couldn't find this from the documentation: using FTS, how do you match records that contain certain tokens beginning at the start of the record (or any token position for that matter). For example, I want to match records that start with Four score and seven years ago but not match records that contain that phrase in the middle. This matches any document that contains the phrase: SELECT rowid FROM documents WHERE content MATCH 'Four score and seven years ago'; But I want only the results that start with that phrase, which would be a subset of those results. It looks like I could programmatically parse the output of the offsetsfunction to find this info and manually filter my results, but is there a way to set up the query so it does the filtering for me, and only returns results that start at byte offset 0 in the column (or token 0)? Thanks! -Paul ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS problem with 'NOT term' queries
Hello! And as result it's impossible to search docs in some situations: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite'; Error: malformed MATCH expression: [NOT sqlite] As far as I can tell, in MATCH syntax NOT is a binary operator, denoting set difference. You are trying to use it as a unary operator. Well, and how to rewrite query MATCH 'NOT sqlite'? In simple FTS qery syntax is possible to use -sqlite but is the equal construction by extended syntax? Best regards, Alexey. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS problem with 'NOT term' queries
On 02/04/2013 12:18 AM, Alexey Pechnikov wrote: Hello! And as result it's impossible to search docs in some situations: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite'; Error: malformed MATCH expression: [NOT sqlite] As far as I can tell, in MATCH syntax NOT is a binary operator, denoting set difference. You are trying to use it as a unary operator. Well, and how to rewrite query MATCH 'NOT sqlite'? In simple FTS qery syntax is possible to use -sqlite but is the equal construction by extended syntax? Not possible. And note that the query -sqlite doesn't actually work either. Always returns an empty set. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS SQLite optimize command creates very large BLOBs in *_segment table.
Hi I have a database using SQLite-3.7.14 with a FTS4 virtual table (Free Text Search). The FTS table contains several millions of small documents. The FTS DB is created on a server (where creating time does not matter) and then used on an embedded device as a read-only database for FTS queries (where speed of queries need to be as fast as possible and use as little memory as possible). Since the DB is used read-only on the device, I ran the the optimize command (see http://www.sqlite.org/fts3.html#optimize) in the FTS index after the last INSERT was done on the server. Unfortunately, I found that the optimize command is creating very large BLOBs in the database since it merges together all of inverted index b-trees into very large complete b-tree. For frequent terms, merged BLOBs can be up to ~50MB in my case in the ftsTable_segment table, as a result of running optimize: sqlite SELECT *, length(block) AS len_blob FROM ftsTable_segments ORDER BY len_blob DESC LIMIT 3; seblockid|block|len_blob 336808||51867353 -- 51 MB BLOB! 311724||19375541 -- 19 MB BLOB 334719||19223423 -- 19 MB BLOB Such large BLOBs (~50MB) are a problem in my case as they consume a large amount of memory on a embedded device, when doing FTS queries with several frequent terms. SQLite memory high watermark reported by sqlite3_memory_highwater() reaches ~200MB when query contains several frequent terms, which is too much for an embedded device, even though I set of soft memory limit of only 3MB using sqlite3_soft_heap_limit64(...). As a result, I have disabled running optimize on the FTS index after creating the FTS DB on the server. However, it got me thinking: it would be nice to be able to pass an optional parameter to the FTS optimize command in order to avoid merging b-trees when BLOBs reach a certain size? In other words, instead of doing... INSERT INTO ftsTable(ftsTable) VALUES('optimize'); ... I would like to be able to do something like this... INSERT INTO ftsTable(ftsTable) VALUES('optimize=1048576'); ... where optimize=1048576 indicates to *partially* optimize reverse index b-trees in such a way that BLOBs do not exceed 1MB (1048577 bytes) in this example. It's OK if it's a fuzzy soft limit. 1/ Wouldn't such partial optimization of FTS index be useful? 2/ I also suggest that the documentation at http://www.sqlite.org/fts3.html#optimize indicates that optimizing an FTS index can create very large BLOBs. This may be OK on a desktop or server, but it can be a problem on embedded devices with limited amount of memory. Regards Dominique ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS problem with 'NOT term' queries
Hello! From http://www.sqlite.org/fts3.html#section_3_1 we can see the query SELECT * FROM docs WHERE docs MATCH 'database NOT sqlite'; But the equal query doesn't works: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite AND database'; Error: malformed MATCH expression: [NOT sqlite AND database] And as result it's impossible to search docs in some situations: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite'; Error: malformed MATCH expression: [NOT sqlite] -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS problem with 'NOT term' queries
On 1/29/2013 11:30 PM, Alexey Pechnikov wrote: From http://www.sqlite.org/fts3.html#section_3_1 we can see the query SELECT * FROM docs WHERE docs MATCH 'database NOT sqlite'; But the equal query doesn't works: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite AND database'; Error: malformed MATCH expression: [NOT sqlite AND database] And as result it's impossible to search docs in some situations: SELECT * FROM docs WHERE docs MATCH 'NOT sqlite'; Error: malformed MATCH expression: [NOT sqlite] As far as I can tell, in MATCH syntax NOT is a binary operator, denoting set difference. You are trying to use it as a unary operator. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS problem with 'NOT term' queries
The explanation right above that table of examples contains these important (I believe) phrases: ...BINARY SET operators... ...TWO operands to an operator... (emphasis mine) On Tue, Jan 29, 2013 at 11:30 PM, Alexey Pechnikov pechni...@mobigroup.ruwrote: SELECT * FROM docs WHERE docs MATCH 'NOT -- -- -- --Ô¿Ô-- K e V i N ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS questions
Hello all I'm new with sqlite3 and sql. I have data base that include path columns (file system path like c:\bla bla\myFiles\1.txt) On that columns I need to do 1) search for patterns in case the user want to find a file or directory 2) search for prefix path in case the user rename a file or directory the Database include about 1.5-2.5 million records and to use the LIKE is not possible because the result time. As an alternative I want to use FTS3 or FTS4 but I think I have a problems with what I read here: http://www.sqlite.org/fts3.html#section_1_4 And here: http://www.sqlite.org/fts3.html#section_6_3 I need to specify the language to FTS to use it as tokenize but the path can include multi languages how can I configure the FTS table to use all languages How can I tell to FTS to token the path only according to the character \ ? More than that when creating FTS table it creates with the TEMPORARY key word. My question is: do I need to create this table each time I run the data base (because the temporary word) or for each connections (in case of multiconnections) or this is a table like all tables I declared and it stay in the data base even if I restart my PC Thanks a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS questions
Hello all I'm new with sqlite3 and sql. I have data base that include path columns (file system path like c:\bla bla\myFiles\1.txt) On that columns I need to do 1) search for patterns in case the user want to find a file or directory 2) search for prefix path in case the user rename a file or directory the Database include about 1.5-2.5 million records and to use the LIKE is not possible because the result time. As an alternative I want to use FTS3 or FTS4 but I think I have a problems with what I read here: http://www.sqlite.org/fts3.html#section_1_4 And here: http://www.sqlite.org/fts3.html#section_6_3 I need to specify the language to FTS to use it as tokenize but the path can include multi languages how can I configure the FTS table to use all languages How can I tell to FTS to token the path only according to the character \ ? More than that when creating FTS table it creates with the TEMPORARY key word. My question is: do I need to create this table each time I run the data base (because the temporary word) or for each connections (in case of multiconnections) or this is a table like all tables I declared and it stay in the data base even if I restart my PC Thanks a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS questions
I'm not sure I understand your problem. Why do you care what language it is? Aren't you just wanting to tokenize on backslash? Simple way is to replace all spaces in the path with another char (e.g. '_') then replace all backslashes with a space. Then you can just use the default tokenizer and make the same changes on any user queries. So you map the user query to your storage format. Not sure why you would want to use some stem tokenizer on paths. And, of course, you're reinventing the wheel unless you have some special purpose in mind. http://locate32.cogit.net/ -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani Sent: Saturday, January 12, 2013 4:37 AM To: sqlite-users@sqlite.org Cc: Moti LAHIANI Subject: [sqlite] FTS questions Hello all I'm new with sqlite3 and sql. I have data base that include path columns (file system path like c:\bla bla\myFiles\1.txt) On that columns I need to do 1) search for patterns in case the user want to find a file or directory 2) search for prefix path in case the user rename a file or directory the Database include about 1.5-2.5 million records and to use the LIKE is not possible because the result time. As an alternative I want to use FTS3 or FTS4 but I think I have a problems with what I read here: http://www.sqlite.org/fts3.html#section_1_4 And here: http://www.sqlite.org/fts3.html#section_6_3 I need to specify the language to FTS to use it as tokenize but the path can include multi languages how can I configure the FTS table to use all languages How can I tell to FTS to token the path only according to the character \ ? More than that when creating FTS table it creates with the TEMPORARY key word. My question is: do I need to create this table each time I run the data base (because the temporary word) or for each connections (in case of multiconnections) or this is a table like all tables I declared and it stay in the data base even if I restart my PC Thanks a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS questions
Thanks for your reply Why I care the language: according to the documentation: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms. if the path include folder or file name in France or Hebrew or Arabic and the user what to search that word according to above the FTS will not found it. Am I wrong? about the backslash/slash if I have a path like this and the user search for עברית did the FTS will find it? c:\folders\video\עברית about the creation I was confuse it not temporary its virtual so ignore my question Thanks On Sat, Jan 12, 2013 at 5:58 PM, Michael Black mdblac...@yahoo.com wrote: I'm not sure I understand your problem. Why do you care what language it is? Aren't you just wanting to tokenize on backslash? Simple way is to replace all spaces in the path with another char (e.g. '_') then replace all backslashes with a space. Then you can just use the default tokenizer and make the same changes on any user queries. So you map the user query to your storage format. Not sure why you would want to use some stem tokenizer on paths. And, of course, you're reinventing the wheel unless you have some special purpose in mind. http://locate32.cogit.net/ -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani Sent: Saturday, January 12, 2013 4:37 AM To: sqlite-users@sqlite.org Cc: Moti LAHIANI Subject: [sqlite] FTS questions Hello all I'm new with sqlite3 and sql. I have data base that include path columns (file system path like c:\bla bla\myFiles\1.txt) On that columns I need to do 1) search for patterns in case the user want to find a file or directory 2) search for prefix path in case the user rename a file or directory the Database include about 1.5-2.5 million records and to use the LIKE is not possible because the result time. As an alternative I want to use FTS3 or FTS4 but I think I have a problems with what I read here: http://www.sqlite.org/fts3.html#section_1_4 And here: http://www.sqlite.org/fts3.html#section_6_3 I need to specify the language to FTS to use it as tokenize but the path can include multi languages how can I configure the FTS table to use all languages How can I tell to FTS to token the path only according to the character \ ? More than that when creating FTS table it creates with the TEMPORARY key word. My question is: do I need to create this table each time I run the data base (because the temporary word) or for each connections (in case of multiconnections) or this is a table like all tables I declared and it stay in the data base even if I restart my PC Thanks a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS questions
Test it yourself: create virtual test using fts4(context text); insert into test values ('c:\folders\video\עברית'); select * from test where context match 'עברית'; If you want a partial match add a wildcard select * from test where context match 'עברית*'; I don't have the codepage running so I can't test it but it sure works for English. Don't see why it wouldn't work for other languages. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani Sent: Saturday, January 12, 2013 10:56 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] FTS questions Thanks for your reply Why I care the language: according to the documentation: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms. if the path include folder or file name in France or Hebrew or Arabic and the user what to search that word according to above the FTS will not found it. Am I wrong? about the backslash/slash if I have a path like this and the user search for עברית did the FTS will find it? c:\folders\video\עברית about the creation I was confuse it not temporary its virtual so ignore my question Thanks On Sat, Jan 12, 2013 at 5:58 PM, Michael Black mdblac...@yahoo.com wrote: I'm not sure I understand your problem. Why do you care what language it is? Aren't you just wanting to tokenize on backslash? Simple way is to replace all spaces in the path with another char (e.g. '_') then replace all backslashes with a space. Then you can just use the default tokenizer and make the same changes on any user queries. So you map the user query to your storage format. Not sure why you would want to use some stem tokenizer on paths. And, of course, you're reinventing the wheel unless you have some special purpose in mind. http://locate32.cogit.net/ -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani Sent: Saturday, January 12, 2013 4:37 AM To: sqlite-users@sqlite.org Cc: Moti LAHIANI Subject: [sqlite] FTS questions Hello all I'm new with sqlite3 and sql. I have data base that include path columns (file system path like c:\bla bla\myFiles\1.txt) On that columns I need to do 1) search for patterns in case the user want to find a file or directory 2) search for prefix path in case the user rename a file or directory the Database include about 1.5-2.5 million records and to use the LIKE is not possible because the result time. As an alternative I want to use FTS3 or FTS4 but I think I have a problems with what I read here: http://www.sqlite.org/fts3.html#section_1_4 And here: http://www.sqlite.org/fts3.html#section_6_3 I need to specify the language to FTS to use it as tokenize but the path can include multi languages how can I configure the FTS table to use all languages How can I tell to FTS to token the path only according to the character \ ? More than that when creating FTS table it creates with the TEMPORARY key word. My question is: do I need to create this table each time I run the data base (because the temporary word) or for each connections (in case of multiconnections) or this is a table like all tables I declared and it stay in the data base even if I restart my PC Thanks a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS substring behavior
Hi there, I wanted to clarify if FTS could provide any optimization for substring matches like '*ion*' or similar? That is, does it only scan the token index for matching tokens to locate the main table records that contain those tokens, or does it do a full table scan of the main table? The number of unique tokens we have is small compared to the total number of records, so if it only scanned the token index it would in theory help. Thanks Paul ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS substring behavior
On 11/09/2012 01:49 AM, Paul Vercellotti wrote: Hi there, I wanted to clarify if FTS could provide any optimization for substring matches like '*ion*' or similar? No. I think it will actually search for tokens that start with the 4 ASCII characters *ion if you try that. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS substring behavior
How about look at following URL? https://github.com/jonasfj/trilite AFAIK, FTS doesn't support substring search. I also tried to edit FTS to find substring by changing simple tokenizer. It was worked partially, but not a good solution to use generally. 2012/11/9 Dan Kennedy danielk1...@gmail.com On 11/09/2012 01:49 AM, Paul Vercellotti wrote: Hi there, I wanted to clarify if FTS could provide any optimization for substring matches like '*ion*' or similar? No. I think it will actually search for tokens that start with the 4 ASCII characters *ion if you try that. Dan. __**_ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**usershttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS substring behavior
That's a promising project; I hope it reaches maturity. I assume your modified tokenizer did a similar thing, like tokenizing full text search as [full, text, search, ull, ll, l, ext, xt, xt, earch, arch, rch, ch, h]? What worked and what did not work? Thanks, Paul From: Yongil Jang yongilj...@gmail.com To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Thursday, November 8, 2012 2:26 PM Subject: Re: [sqlite] FTS substring behavior How about look at following URL? https://github.com/jonasfj/trilite AFAIK, FTS doesn't support substring search. I also tried to edit FTS to find substring by changing simple tokenizer. It was worked partially, but not a good solution to use generally. 2012/11/9 Dan Kennedy danielk1...@gmail.com On 11/09/2012 01:49 AM, Paul Vercellotti wrote: Hi there, I wanted to clarify if FTS could provide any optimization for substring matches like '*ion*' or similar? No. I think it will actually search for tokens that start with the 4 ASCII characters *ion if you try that. Dan. __**_ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**usershttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS substring behavior
A little bit different. That sentence of full text search is split as [ful, tex, sea, ull, ll, l, ext, xt, xt, ear, arc, rch, ch, h] with 3 maximum length of key. Of course, search pattern string length is limited to 3 and query is changed as follows. [Example: search 'earch'] SELECT ... FROM [FTS table] WHERE [search fields or table] MATCH 'ear' AND ([search field1] like '%earch%' OR [search field2] like '%earch%' OR ... OR [search fieldn] like '%earch%') The reason of ... 1) 3 characters: To minimize FTS index size and to improve search performance 2) Adding like clauses: To refine result set 2) many likes: To search all columns in FTS table when a table name is used in MATCH clause. BR, Yongil Jang. 2012/11/9 Paul Vercellotti pverce...@yahoo.com That's a promising project; I hope it reaches maturity. I assume your modified tokenizer did a similar thing, like tokenizing full text search as [full, text, search, ull, ll, l, ext, xt, xt, earch, arch, rch, ch, h]? What worked and what did not work? Thanks, Paul From: Yongil Jang yongilj...@gmail.com To: General Discussion of SQLite Database sqlite-users@sqlite.org Sent: Thursday, November 8, 2012 2:26 PM Subject: Re: [sqlite] FTS substring behavior How about look at following URL? https://github.com/jonasfj/trilite AFAIK, FTS doesn't support substring search. I also tried to edit FTS to find substring by changing simple tokenizer. It was worked partially, but not a good solution to use generally. 2012/11/9 Dan Kennedy danielk1...@gmail.com On 11/09/2012 01:49 AM, Paul Vercellotti wrote: Hi there, I wanted to clarify if FTS could provide any optimization for substring matches like '*ion*' or similar? No. I think it will actually search for tokens that start with the 4 ASCII characters *ion if you try that. Dan. __**_ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS returns out of memory when use NEAR and OR
Hello For some time already i noticed that when i use NEAR/1 and OR in one query like SELECT * FROM search WHERE search MATCH 'tom NEAR/1 hanks or tom hanks' i get out of memory error. Running this on 16Gb laptop cannot be memory issue and the database only has several thousands of records. Investigating the code i found one place where in fts3EvalNearTest where it happens: line 129689 i version 3.7.14.1 nTmp += p-pPhrase-doclist.nList; aTmp = sqlite3_malloc(nTmp*2); if( !aTmp ){ Adding the check before the malloc solved the problem if (nTmp = 0) return res; Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS returns out of memory when use NEAR and OR
On 10/24/2012 11:07 PM, Vlad Seryakov wrote: Hello For some time already i noticed that when i use NEAR/1 and OR in one query like SELECT * FROM search WHERE search MATCH 'tom NEAR/1 hanks or tom hanks' Are you able to share the database file that you use to reproduce this? Thanks. Dan. i get out of memory error. Running this on 16Gb laptop cannot be memory issue and the database only has several thousands of records. Investigating the code i found one place where in fts3EvalNearTest where it happens: line 129689 i version 3.7.14.1 nTmp += p-pPhrase-doclist.nList; aTmp = sqlite3_malloc(nTmp*2); if( !aTmp ){ Adding the check before the malloc solved the problem if (nTmp= 0) return res; Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Phrase queries
Ever since I started using FTS extensively, I frequently ran into this limitation: ** TODO: Strangely, it is not possible to associate a column specifier ** with a quoted phrase, only with a single token. Not sure if this was ** an implementation artifact or an intentional decision when fts3 was ** first implemented. Whichever it was, this module duplicates the ** limitation. Is it ever planned to be fixed, because it doesn't seem to break any backwards compatibilty? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS Example Fails (matchinfo arguments)
I am running sqlite3 version 3.7.3 on debian. I run the following commands from fts3.html documentation page: CREATE VIRTUAL TABLE t1 USING fts4(a, b); INSERT INTO t1 VALUES('transaction default models default', 'Non transaction reads'); INSERT INTO t1 VALUES('the default transaction', 'these semantics present'); INSERT INTO t1 VALUES('single request', 'default data'); SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction these semantics'; SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction'; The last line with match info(t1, 'ns') fails with the following error message: Error: wrong number of arguments to function matchinfo() I have originally worked on my own table, but found the same error. So, I have tried example above and it failed for me in the same way. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Example Fails (matchinfo arguments)
On 06/14/2012 01:27 PM, Sergei G wrote: I am running sqlite3 version 3.7.3 on debian. I run the following commands from fts3.html documentation page: CREATE VIRTUAL TABLE t1 USING fts4(a, b); INSERT INTO t1 VALUES('transaction default models default', 'Non transaction reads'); INSERT INTO t1 VALUES('the default transaction', 'these semantics present'); INSERT INTO t1 VALUES('single request', 'default data'); SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction these semantics'; SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction'; The last line with match info(t1, 'ns') fails with the following error message: Error: wrong number of arguments to function matchinfo() I have originally worked on my own table, but found the same error. So, I have tried example above and it failed for me in the same way. I think the two argument version of matchinfo() is only in 3.7.4 and newer. Prior to that it only accepted one argument. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Example Fails (matchinfo arguments)
Is there a way I can obtain documentation that matches my version? Online documentation is for the most current version. I have found that both my hosting provider and debian stable are a bit behind, so I have to work with what I've got. Thanks On Thu, Jun 14, 2012 at 1:16 AM, Dan Kennedy danielk1...@gmail.com wrote: On 06/14/2012 01:27 PM, Sergei G wrote: I am running sqlite3 version 3.7.3 on debian. I run the following commands from fts3.html documentation page: CREATE VIRTUAL TABLE t1 USING fts4(a, b); INSERT INTO t1 VALUES('transaction default models default', 'Non transaction reads'); INSERT INTO t1 VALUES('the default transaction', 'these semantics present'); INSERT INTO t1 VALUES('single request', 'default data'); SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction these semantics'; SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction'; The last line with match info(t1, 'ns') fails with the following error message: Error: wrong number of arguments to function matchinfo() I have originally worked on my own table, but found the same error. So, I have tried example above and it failed for me in the same way. I think the two argument version of matchinfo() is only in 3.7.4 and newer. Prior to that it only accepted one argument. __**_ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**usershttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Example Fails (matchinfo arguments)
On Thu, 14 Jun 2012 13:13:58 -0700, Sergei G sergeig.pub...@gmail.com wrote: Is there a way I can obtain documentation that matches my version? Online documentation is for the most current version. I have found that both my hosting provider and debian stable are a bit behind, so I have to work with what I've got. Thanks Part of the documentation is partly generated from the sqlite source code maintained in the fossil [1] source repository. The remainder is kept in a separate source repository. You can check out any point on the timelines of both repositories. Links are at the bottom of : http://www.sqlite.org/download.html [1] http://www.fossil-scm.org/ -- Regards, Kees Nuyt ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Example Fails (matchinfo arguments)
On Thu, Jun 14, 2012 at 4:27 PM, Kees Nuyt k.n...@zonnet.nl wrote: On Thu, 14 Jun 2012 13:13:58 -0700, Sergei G sergeig.pub...@gmail.com wrote: Is there a way I can obtain documentation that matches my version? Online documentation is for the most current version. I have found that both my hosting provider and debian stable are a bit behind, so I have to work with what I've got. Thanks Part of the documentation is partly generated from the sqlite source code maintained in the fossil [1] source repository. The remainder is kept in a separate source repository. You can check out any point on the timelines of both repositories. Links are at the bottom of : http://www.sqlite.org/download.html [1] http://www.fossil-scm.org/ -- Regards, Kees Nuyt And, of course, there's also the Internet Archive: http://wayback.archive.org/web/*/http://www.sqlite.org -- -- -- --Ô¿Ô-- K e V i N ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Example Fails (matchinfo arguments)
On Thu, Jun 14, 2012 at 4:13 PM, Sergei G sergeig.pub...@gmail.com wrote: Is there a way I can obtain documentation that matches my version? Online documentation is for the most current version. I have found that both my hosting provider and debian stable are a bit behind, so I have to work with what I've got. http://www.sqlite.org/sqlite_docs_3_7_3.zip -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS simple tokenizer with custom delimeters
While looking around in the source of the simple tokenizer I found code that suggests custom delimeters can be specified (I want to exclude the underscore). http://www.sqlite.org/src/artifact/5c98225a53705e5ee34824087478cf477bdb7004? ln=76-87 An indeed: CREATE VIRTUAL TABLE ft USING fts3(title, body, tokenize=simple XX ['\ *()./\\=,:;%-?!]) seems to work fine. As far as I can tell this feature is undocumented which means I am not suppose to use it. Is this: - An oversight - For good reason as it is unstable - or: because the syntax might change in the near future? Also: I need to include the dummy XX as the delimeters are searched in argv[1] in stead of argv[0]. I cannot find what the argv[0] is supposed to do here. Any reason? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Sqlite FTS retrieve inverted index
Hello, unfortunately I have already posted this question on stackoverflowhttp://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table, hope that this mailing list is right address. After I have implemented a full text search function in my application using Sqlite and FTS tables I would be interested in a (performant) way of retrieving the FULL inverted index (or large part) out of my FTS (sub-)table. In effect - I would need a result table including the terms, docid's and number of occurences. I am actually searching for some basic code/examples to read the segdir / segments table (where the actual index is stored ) and construct my desired result table (in effect - the inverted index). But any solution which could retrieve the full (or large part of) my inverted index using queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export) would be highly appreciated! Best, mario ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sqlite FTS retrieve inverted index
See http://www.sqlite.org/draft/fts3.html#fts4aux 2012/3/13 Mario Annau mario.an...@gmail.com: Hello, unfortunately I have already posted this question on stackoverflowhttp://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table, hope that this mailing list is right address. After I have implemented a full text search function in my application using Sqlite and FTS tables I would be interested in a (performant) way of retrieving the FULL inverted index (or large part) out of my FTS (sub-)table. In effect - I would need a result table including the terms, docid's and number of occurences. I am actually searching for some basic code/examples to read the segdir / segments table (where the actual index is stored ) and construct my desired result table (in effect - the inverted index). But any solution which could retrieve the full (or large part of) my inverted index using queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export) would be highly appreciated! Best, mario ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Sqlite FTS retrieve inverted index
Hi Alexey, tha Am 12. März 2012 22:14 schrieb Alexey Pechnikov pechni...@mobigroup.ru: See http://www.sqlite.org/draft/fts3.html#fts4aux have already tried the fts4aux table. however, I would also need the number of occurrences of each term in each document. Therefore, like in the docs, not only *-- The following query returns this data:**--**-- apple | * | 1 | 1**-- apple | 0 | 1 | 1**-- banana | * | 2 | 2**-- banana | 0 | 2 | 2**-- cherry | * | 3 | 3**-- cherry | 0 | 1 | 1**-- cherry | 1 | 2 | 2**-- date| * | 1 | 2**-- date| 0 | 1 | 2**-- elderberry | * | 1 | 2**-- elderberry | 1 | 1 | 1**-- elderberry | 1 | 1 | 1* but a result table like this: Term|col |docid| occurences -- -- apple | 0 | 1 | 1 -- banana | 0 | 2 | 1 -- cherry | 0 | 3 | 1 -- cherry | 1 | 1 | 1 -- cherry | 1 | 2 | 1 -- date| 0 | 2 | 2 -- elderberry | 0 | 3 | 1 -- elderberry | 1 | 3 | 1 Best, mario 2012/3/13 Mario Annau mario.an...@gmail.com: Hello, unfortunately I have already posted this question on stackoverflow http://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table , hope that this mailing list is right address. After I have implemented a full text search function in my application using Sqlite and FTS tables I would be interested in a (performant) way of retrieving the FULL inverted index (or large part) out of my FTS (sub-)table. In effect - I would need a result table including the terms, docid's and number of occurences. I am actually searching for some basic code/examples to read the segdir / segments table (where the actual index is stored ) and construct my desired result table (in effect - the inverted index). But any solution which could retrieve the full (or large part of) my inverted index using queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export) would be highly appreciated! Best, mario ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS simple tokenizer
On 02/28/2012 12:09 AM, Jos Groot Lipman wrote: It was reported before (and not solved) http://www.mail-archive.com/sqlite-users@sqlite.org/msg55959.html The document sources are updated now. So the fix will appear on the website next time it is regenerated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS simple tokenizer
Using the _ character to separate words is an informal language standard, s in: method_do_this... On Tue, Feb 28, 2012 at 12:40 AM, Dan Kennedy danielk1...@gmail.com wrote: On 02/28/2012 12:09 AM, Jos Groot Lipman wrote: It was reported before (and not solved) http://www.mail-archive.com/**sqlite-users@sqlite.org/**msg55959.htmlhttp://www.mail-archive.com/sqlite-users@sqlite.org/msg55959.html The document sources are updated now. So the fix will appear on the website next time it is regenerated. __**_ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**usershttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS simple tokenizer
Thanks Dan. Have just checked how to report bug, and apparently we already have :) Please excuse the brevity -- sent from my phone On 27 Feb 2012, at 07:06, Dan Kennedy danielk1...@gmail.com wrote: On 02/27/2012 05:59 AM, Hamish Allan wrote: The docs for the simple tokenizer (http://www.sqlite.org/fts3.html#tokenizer) say: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. If I do: CREATE VIRTUAL TABLE test USING fts3(); INSERT INTO test (content) VALUES ('hello_world'); SELECT * FROM test WHERE content MATCH 'orld'; SELECT * FROM test WHERE content MATCH 'world'; I get no match for the first query, because it doesn't match a term, but I get a match for the second, whereas according to my reading of the docs world shouldn't be a term because the underscore character shouldn't be considered a term break. Can anyone please help me understand this behaviour? Documentation bug. Eligible characters are just alphanumerics and UTF codepoints greater than 128. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS simple tokenizer
It was reported before (and not solved) http://www.mail-archive.com/sqlite-users@sqlite.org/msg55959.html -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Hamish Allan Sent: maandag 27 februari 2012 11:27 To: General Discussion of SQLite Database Cc: sqlite-users@sqlite.org Subject: Re: [sqlite] FTS simple tokenizer Thanks Dan. Have just checked how to report bug, and apparently we already have :) Please excuse the brevity -- sent from my phone On 27 Feb 2012, at 07:06, Dan Kennedy danielk1...@gmail.com wrote: On 02/27/2012 05:59 AM, Hamish Allan wrote: The docs for the simple tokenizer (http://www.sqlite.org/fts3.html#tokenizer) say: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. If I do: CREATE VIRTUAL TABLE test USING fts3(); INSERT INTO test (content) VALUES ('hello_world'); SELECT * FROM test WHERE content MATCH 'orld'; SELECT * FROM test WHERE content MATCH 'world'; I get no match for the first query, because it doesn't match a term, but I get a match for the second, whereas according to my reading of the docs world shouldn't be a term because the underscore character shouldn't be considered a term break. Can anyone please help me understand this behaviour? Documentation bug. Eligible characters are just alphanumerics and UTF codepoints greater than 128. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS simple tokenizer
The docs for the simple tokenizer (http://www.sqlite.org/fts3.html#tokenizer) say: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. If I do: CREATE VIRTUAL TABLE test USING fts3(); INSERT INTO test (content) VALUES ('hello_world'); SELECT * FROM test WHERE content MATCH 'orld'; SELECT * FROM test WHERE content MATCH 'world'; I get no match for the first query, because it doesn't match a term, but I get a match for the second, whereas according to my reading of the docs world shouldn't be a term because the underscore character shouldn't be considered a term break. Can anyone please help me understand this behaviour? Thanks, Hamish ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS simple tokenizer
On 02/27/2012 05:59 AM, Hamish Allan wrote: The docs for the simple tokenizer (http://www.sqlite.org/fts3.html#tokenizer) say: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. If I do: CREATE VIRTUAL TABLE test USING fts3(); INSERT INTO test (content) VALUES ('hello_world'); SELECT * FROM test WHERE content MATCH 'orld'; SELECT * FROM test WHERE content MATCH 'world'; I get no match for the first query, because it doesn't match a term, but I get a match for the second, whereas according to my reading of the docs world shouldn't be a term because the underscore character shouldn't be considered a term break. Can anyone please help me understand this behaviour? Documentation bug. Eligible characters are just alphanumerics and UTF codepoints greater than 128. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS malformed MATCH expression: [( a* OR b*)]
The native SQLite code bundled with System.Data.SQLite is not compiled with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for the expression you are asking about to work properly. Of course, it can always be recompiled if you have access to Visual C++, by editing one of the following files and recompiling the SQLite.Interop project: root/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008) root/SQLite.Interop/props/sqlite3.props (for VC++ 2010) Where root is the root of the source tree for System.Data.SQLite. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS malformed MATCH expression: [( a* OR b*)]
Thank you, I'll try to recompile. I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and sqlite3), could those be of any good? În data de 03.02.2012 16:52, Joe Mistachkin sql...@mistachkin.com a scris: The native SQLite code bundled with System.Data.SQLite is not compiled with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for the expression you are asking about to work properly. Of course, it can always be recompiled if you have access to Visual C++, by editing one of the following files and recompiling the SQLite.Interop project: root/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008) root/SQLite.Interop/props/sqlite3.props (for VC++ 2010) Where root is the root of the source tree for System.Data.SQLite. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS malformed MATCH expression: [( a* OR b*)]
Hi, I tried to build with all default and it gave me this: Error 5 error LNK1181: cannot open input file 'root\SQLite.Interop\..\bin\2010\ReleaseModule\bin\System.Data.SQLite.netmodule' root \SQLite.Interop\LINK SQLite.Interop.2010 On Fri, Feb 3, 2012 at 5:32 PM, Stefan Rogin stefan.rogin2...@gmail.comwrote: Thank you, I'll try to recompile. I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and sqlite3), could those be of any good? În data de 03.02.2012 16:52, Joe Mistachkin sql...@mistachkin.com a scris: The native SQLite code bundled with System.Data.SQLite is not compiled with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for the expression you are asking about to work properly. Of course, it can always be recompiled if you have access to Visual C++, by editing one of the following files and recompiling the SQLite.Interop project: root/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008) root/SQLite.Interop/props/sqlite3.props (for VC++ 2010) Where root is the root of the source tree for System.Data.SQLite. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- *Stefan Rogin* Webdesigner Tel: +40.769.622.178 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS malformed MATCH expression: [( a* OR b*)]
Forget the last question. I've managed to get passed it :), I didn't open the whole solution file, just the SQLite.Interop.2010 project, I've seen that it's set on .Net 4 How can I set it to 3.5 and are there any drawbacks? Anyway, thanks for the quick reply. +1 for the support On Fri, Feb 3, 2012 at 9:41 PM, Stefan Rogin stefan.rogin2...@gmail.comwrote: Hi, I tried to build with all default and it gave me this: Error 5 error LNK1181: cannot open input file 'root\SQLite.Interop\..\bin\2010\ReleaseModule\bin\System.Data.SQLite.netmodule' root \SQLite.Interop\LINK SQLite.Interop.2010 On Fri, Feb 3, 2012 at 5:32 PM, Stefan Rogin stefan.rogin2...@gmail.comwrote: Thank you, I'll try to recompile. I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and sqlite3), could those be of any good? În data de 03.02.2012 16:52, Joe Mistachkin sql...@mistachkin.com a scris: The native SQLite code bundled with System.Data.SQLite is not compiled with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for the expression you are asking about to work properly. Of course, it can always be recompiled if you have access to Visual C++, by editing one of the following files and recompiling the SQLite.Interop project: root/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008) root/SQLite.Interop/props/sqlite3.props (for VC++ 2010) Where root is the root of the source tree for System.Data.SQLite. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- *Stefan Rogin* Webdesigner Tel: +40.769.622.178 -- *Stefan Rogin* Webdesigner Tel: +40.769.622.178 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS malformed MATCH expression: [( a* OR b*)]
Stefan Rogin wrote: I've seen that it's set on .Net 4 The VS 2010 project uses .NET 4, the VS 2008 project uses .NET 3.5. How can I set it to 3.5 and are there any drawbacks? Depends on what environment(s) you are going to use your application in. There are trade-offs; however, that is why we provide support for both. -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS malformed MATCH expression: [( a* OR b*)]
Hi, I've encountered some issues when using *FTS4 MATCH* on a database, like the one mentioned in the subject (*malformed MATCH expression: [( a* OR b*)]*). From what I discovered it is triggered by using a quote at the beginning of a parenthesis, if I write *(b* OR a*)* it works just fine. What I found interesting is that if I use Navicat with the SQLite database, the query above works just fine. Also there is support for NOT opposed to - supported by System.Data.SQLite. *this is the syntax I used to get the data:* DataTable dt = new DataTable(); using (SQLiteCommand cmd = mySQLiteConn.CreateCommand()) { cmd.CommandType = CommandType.Text; cmd.CommandText = sql; SQLiteDataAdapter sqlDa = new SQLiteDataAdapter(cmd); sqlDa.Fill(dt); } *exception detail:* System.Data.SQLite.SQLiteException was unhandled Message=SQLite error malformed MATCH expression: [(a* OR b*)] Source=System.Data.SQLite ErrorCode=-2147467259 StackTrace: at System.Data.SQLite.SQLite3.Reset(SQLiteStatement stmt) at System.Data.SQLite.SQLite3.Step(SQLiteStatement stmt) at System.Data.SQLite.SQLiteDataReader.NextResult() at System.Data.SQLite.SQLiteDataReader..ctor(SQLiteCommand cmd, CommandBehavior behave) at System.Data.SQLite.SQLiteCommand.ExecuteReader(CommandBehavior behavior) at System.Data.SQLite.SQLiteCommand.ExecuteDbDataReader(CommandBehavior behavior) at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior) at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior) at System.Data.Common.DbDataAdapter.Fill(DataTable[] dataTables, Int32 startRecord, Int32 maxRecords, IDbCommand command, CommandBehavior behavior) at System.Data.Common.DbDataAdapter.Fill(DataTable dataTable) at SQLite_Take2.Form1.ftsSearch() in D:\c#\SQLite Take2\Main.cs:line 607 at SQLite_Take2.Form1.cmdSearch_Click(Object sender, EventArgs e) in D:\c#\SQLite Take2\Main.cs:line 340 at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Button.OnClick(EventArgs e) at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent) at System.Windows.Forms.Control.WmMouseUp(Message m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message m) at System.Windows.Forms.ButtonBase.WndProc(Message m) at System.Windows.Forms.Button.WndProc(Message m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message m) at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam) at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG msg) at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData) at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context) at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context) at System.Windows.Forms.Application.Run(Form mainForm) at SQLite_Take2.Program.Main() in D:\c#\SQLite Take2\Program.cs:line 18 at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args) at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() InnerException: *I use:* Windows 7 , 64bit Visual Studio 2010 SQLite Server version : 3.7.7.1 Connection string: Data Source=.\temp.db;UTF8Encoding=True;Version=3 System.Data.SQLite.dll version : 1.0.74.0 -- *Stefan Rogin* Webdesigner Tel: +40.769.622.178 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] [FTS] Executing Sql statements inside a custom tokenizer
Hi, I would like to build up a table of all the unique words occurring in my corpus (for spelling suggestion feature). Presently I am using the Porter stemming tokenizer and I would not like to stop using the stemmer at any cost. Although if I was not using the Porter stemmer then I could easily obtain the list of unique words in the corpus using the FTS4Aux module. But using the stemmer means that all the words are stored in the index in their stem form which is not desirable for building a dictionary of proper English words. One solution is to use a custom tokenizer. I was thinking of using the default Porter tokenizer supplied with Sqlite and adding some bits of code to store the token in a separate table before stemming it down. But I am not sure if it is ok to access or modify the database using Sql statements inside a tokenizer. Now that I think of it, the tokenizer code is also executed when an SQL query is performed against the FTS table (when performing search), at which time I don't want my dictionary building code to execute. So perhaps this is not a good idea. What other options do I have ? Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer
On Jan 3, 2012, at 8:30 PM, Abhinav Upadhyay wrote: What other options do I have ? Two FTS tables? One with the Porter stemmer, for search, one without, to build the auxiliary tables? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer
Two FTS tables? One with the Porter stemmer, for search, one without, to build the auxiliary tables? Yeah, that is the last option, if nothing else works. For a small set of documents the extra processing time might be ok but for a larger set of documents building the FTS tables twice might be a bit taxing. I think I will first try a custom tokenizer. So is it ok to execute SQL statements from within the tokenizer ? It would have been great if there was some way to determine whether the tokenizer code is being executed for indexing the documents or for searching the index. Thanks Abhinav ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Fts */or. Inconsistencies
SQLite Gurus, In SQLIte FTS3/4, does the '*' (wildcard expansion character) discriminate between alphanumeric characters vs non-alpha numeric characters when matching? I have two test cases below which causes me to believe that it does. Also, the OR operator appears to fail when matching against a combination of alphanumerics and non-alphanumerics. I'm using a custom tokenizer in each scenario (yes it works and the proof is enclosed). In the first dataset, the data was tokenized such that any alphanumeric character qualifies as part of a token. In the second dataset, the data was tokenized such that anything other than a semicolon qualifies as part of a token.\ The issues I'm raising is that 1) the '*' expansion doesn't seem to work for a alphanumeric/non-alphanumeric token matches nor does the OR operator. I haven't figured out what I'm missing here. I've been banging my head all morning against tihs. I appreciate any help. I've listed a working and non working example below since a comparison should clarify best: [THIS WORKS FINE] MY DATASET: STATE|NAMES --- maryland|fred,louis,jenny virginia|ruth,greg,denise maine|richard,norman,willis TOKENIZER CRITERIA USED: = any word characters (\w+) (comment: all these names should be split into regular tokens and they were as shown below) FULL TEXT INDEX: sqlite select * from ft_terms; term col documents occurrences -- -- -- --- fred * 1 1 fred 1 1 1 jenny * 1 1 jenny 1 1 1 louis * 1 1 louis 1 1 1 maryland * 1 1 maryland 0 1 1 ruth * 1 1 ruth 1 1 1 virginia * 1 1 virginia 0 1 1 QUERY: select * from word where word match 'mary* jen*' RETURNS: maryland|fred,louis,jenny QUERY: select * from word where word match 'mary* OR v*' RETURNS: maryland|fred,louis,jenny virginia|ruth,greg,denise All the above behaves as expected. Now lets introduce some non-alphanumerics -- [THIS DOESN'T WORK] DATASET: ROWSET|PAIR -- 1 A=15;B=16;C=38 2 D=15;E=25;F=16 TOKENIZER CRITERIA USED: = any character that is NOT a semicolon ([^;]+) (comment: all these PAIR values should be tokenized by semicolon and they were as shown below) FULL TEXT INDEX: term col documents occurrences -- -- -- --- 1 * 1 1 1 0 1 1 2 * 1 1 2 0 1 1 A=15 * 1 1 A=15 1 1 1 B=16 * 1 1 B=16 1 1 1 C=38 * 1 1 C=38 1 1 1 D=15 * 1 1 D=15 1 1 1 E=25 * 1 1 E=25 1 1 1 F=16 * 1 1 F=16 1 1 1 QUERY1: select * from NUMMY where NUMMY MATCH 'A=* OR D=*' RETURNS: (nothing) comment: Should have returned ROWSET 1 and 2 (refer to above dataset) QUERY2: select * from NUMMY where NUMMY MATCH 'A* C*' RETURNS: (nothing) comment: Should have returned ROWSET 1 (refer to above dataset) --HOWEVER SPECIFYING THE FULL TOKEN WORKS-- QUERY3: select * from NUMMY where NUMMY MATCH 'A=15' RETURNS: 1|A=15;B=16;C=38 QUERY4: select * from NUMMY where NUMMY MATCH 'A=15' RETURNS: 1|A=15;B=16;C=38 QUERY5: select * from NUMMY where NUMMY MATCH 'E=25' RETURNS: 2|D=15;E=25;F=16 --THIS SUCCESS IS SHORTLIVED-- QUERY6: select * from NUMMY where NUMMY MATCH 'E=25 OR B=16' RETURNS: (nothing) This query should have returned rowset 1 and two as this was an OR query and both creiteria are met. In summary, the wildcard expansion as well as the OR operator seems not to work in the second example. There only main difference between the two data sets in that the first set is composed of alpha characters only and the second is a combination of alphanumeric and non-aplhanumerica characters. The (*) expansion character is not matching these. A match only occurs when you specify the full token. Thanks for your time in looking at this issue. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fts */or. Inconsistencies
On 12/13/2011 02:29 AM, Ephraim Stevens wrote: I'm using a custom tokenizer in each scenario (yes it works and the proof is enclosed). In the first dataset, the data was tokenized such that any alphanumeric character qualifies as part of a token. In the second dataset, the data was tokenized such that anything other than a semicolon qualifies as part of a token.\ The issues I'm raising is that 1) the '*' expansion doesn't seem to work for a alphanumeric/non-alphanumeric token matches nor does the OR operator. I haven't figured out what I'm missing here. I've been banging my head all morning against tihs. I appreciate any help. Do your custom tokenizers allow whitespace or * characters to be part of tokens? If so, try changing them so that they do not. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS Tokenizer (separator)
Greetings All, From section seven of the FTS3/FTS4 documentation: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms. Is there a way to modify/control this behavior? I would like the equal sign ('=') to be treated with the same designation as an alpha numeric character. Currently, the equal sign acts as a separator. Thanks in advance for any suggestions/help you provide. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Tokenizer (separator)
On Fri, Dec 9, 2011 at 6:48 AM, Ephraim Stevens ephraim.stev...@gmail.comwrote: Greetings All, From section seven of the FTS3/FTS4 documentation: A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the _ character, and all characters with UTF codepoints greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms. Is there a way to modify/control this behavior? I would like the equal sign ('=') to be treated with the same designation as an alpha numeric character. Currently, the equal sign acts as a separator. Thanks in advance for any suggestions/help you provide. Create your own tokenizer. http://www.sqlite.org/fts3.html#section_7_1 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS: Phrase queries
2011/11/14 nobre rafael.ro...@novaprolink.com.br Comment from the source: ** TODO: Strangely, it is not possible to associate a column specifier ** with a quoted phrase, only with a single token. Not sure if this was ** an implementation artifact or an intentional decision when fts3 was ** first implemented. Whichever it was, this module duplicates the ** limitation. So, seems its really not possible Rafael Thanks, I hope this limitation will be lifted someday. Fixing it will not break any existing queries and is fully backwards compatible, so I dont understand why FTS4 duplicates that (faulty) behaviour. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS: Phrase queries
Comment from the source: ** TODO: Strangely, it is not possible to associate a column specifier ** with a quoted phrase, only with a single token. Not sure if this was ** an implementation artifact or an intentional decision when fts3 was ** first implemented. Whichever it was, this module duplicates the ** limitation. So, seems its really not possible Rafael Fabian-40 wrote: When I have a basic FTS query that needs to be restricted to a column, I can write it in two ways: 1.) WHERE column MATCH 'apple' 2.) WHERE table MATCH 'column:apple' But when I have a phrase query, I can only write it in one way: 1.) WHERE column MATCH 'apple juice' The problem is that when I want to combine the queries (search for 'apple' in column1 and for apple juice in column2) i cannot write the query like: WHERE column1 MATCH 'apple' AND column2 MATCH 'apple juice' Nor can I write it like: WHERE table MATCH 'column1:apple column2:apple juice' So this fairly simple query, seems impossible in FTS? Or does anyone know how to workaround this (without doing two seperate queries)? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- View this message in context: http://old.nabble.com/FTS%3A-Phrase-queries-tp32834649p32839669.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Phrase queries
When I have a basic FTS query that needs to be restricted to a column, I can write it in two ways: 1.) WHERE column MATCH 'apple' 2.) WHERE table MATCH 'column:apple' But when I have a phrase query, I can only write it in one way: 1.) WHERE column MATCH 'apple juice' The problem is that when I want to combine the queries (search for 'apple' in column1 and for apple juice in column2) i cannot write the query like: WHERE column1 MATCH 'apple' AND column2 MATCH 'apple juice' Nor can I write it like: WHERE table MATCH 'column1:apple column2:apple juice' So this fairly simple query, seems impossible in FTS? Or does anyone know how to workaround this (without doing two seperate queries)? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS vs INDEX
Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
I recently benchmarked this...FTS4 has a prefix option that can make it slightly faster than TEXT. Other than that it's about the same speed. http://www.mail-archive.com/sqlite-users@sqlite.org/msg64591.html The older part of the thread has the benchmark data Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, October 19, 2011 9:20 AM To: General Discussion of SQLite Database Subject: EXT :[sqlite] FTS vs INDEX Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
Very interesting benchmarks! However it seems to focus mainly on the speed of SELECT queries, and the total size of the resulting database on disk. But my main concern is about the speed of INSERT queries vs normal tables. Any chance you compared that too? 2011/10/19 Black, Michael (IS) michael.bla...@ngc.com I recently benchmarked this...FTS4 has a prefix option that can make it slightly faster than TEXT. Other than that it's about the same speed. http://www.mail-archive.com/sqlite-users@sqlite.org/msg64591.html The older part of the thread has the benchmark data Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, October 19, 2011 9:20 AM To: General Discussion of SQLite Database Subject: EXT :[sqlite] FTS vs INDEX Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
FTS use index multi-tree and de-facto has _no_ insert speed degradation. I did do test for 400+ millions of records. With b-tree index there is insert speed degradation: http://geomapx.blogspot.com/2010/04/sqlite-index-degradation-tests.html http://geomapx.blogspot.com/search?q=index+speed So FTS as hash-index is nice. 2011/10/19 Fabian fabianpi...@gmail.com: Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Alexey Pechnikov pechni...@mobigroup.ru FTS use index multi-tree and de-facto has _no_ insert speed degradation. Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that same multi-tree mechanism for regular indexes, but that's a whole different question. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Fabian fabianpi...@gmail.com: Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that same multi-tree mechanism for regular indexes, but that's a whole different question. It's impossible with SQLite3 database format. May be SQLite4 will be support it :) -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
Nope -- didn't note the insert speed on that test. Why don't you take my benchmark data and test it yourself? Then post the results. The saying your mileage may vary comes to mind... Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, October 19, 2011 9:44 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] FTS vs INDEX Very interesting benchmarks! However it seems to focus mainly on the speed of SELECT queries, and the total size of the resulting database on disk. But my main concern is about the speed of INSERT queries vs normal tables. Any chance you compared that too? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
On Wed, Oct 19, 2011 at 7:56 AM, Fabian fabianpi...@gmail.com wrote: 2011/10/19 Alexey Pechnikov pechni...@mobigroup.ru FTS use index multi-tree and de-facto has _no_ insert speed degradation. Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that same multi-tree mechanism for regular indexes, but that's a whole different question. To be clear, how it works is that new insertions are batched into a new index tree, with index trees periodically aggregated to keep selection efficient and to keep the size contained. So while the speed per insert should remain pretty stable constant, periodically an insert will require index maintenance, so that insert will be slower. If you have a lot of documents (or a small page cache) these maintenance events can get pretty expensive relative to the cost of a non-maintenance insert. So it's not a clear-cut win, but it probably would be interesting as an alternative sort of index for some tables. -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Scott Hess sh...@google.com To be clear, how it works is that new insertions are batched into a new index tree, with index trees periodically aggregated to keep selection efficient and to keep the size contained. So while the speed per insert should remain pretty stable constant, periodically an insert will require index maintenance, so that insert will be slower. If you have a lot of documents (or a small page cache) these maintenance events can get pretty expensive relative to the cost of a non-maintenance insert. So it's not a clear-cut win, but it probably would be interesting as an alternative sort of index for some tables. I always do inserts in batches of 100.000 rows, and after each batch I manually merge the b-trees using: INSERT INTO table(table) VALUES('optimize'); Is there a possibility that it will do automatic maintenance half-way during a batch? Or will it always wait untill the transaction is finished? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Reduce tokens
Using the default tokenizer, everything that is not an alphanumeric character or an underscore, will generate a new token. I have a lot of columns that contains e-mail addresses or URL's, and most of them have characters like '.', '@' and '/'. Is there a simple way to make FTS see them as one single token, instead of splitting those strings into many small ones? I know it's possible to develop a custom tokenizer, but that's way over my head I'm afraid :) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
On Wed, Oct 19, 2011 at 12:50 PM, Fabian fabianpi...@gmail.com wrote: 2011/10/19 Scott Hess sh...@google.com To be clear, how it works is that new insertions are batched into a new index tree, with index trees periodically aggregated to keep selection efficient and to keep the size contained. So while the speed per insert should remain pretty stable constant, periodically an insert will require index maintenance, so that insert will be slower. If you have a lot of documents (or a small page cache) these maintenance events can get pretty expensive relative to the cost of a non-maintenance insert. So it's not a clear-cut win, but it probably would be interesting as an alternative sort of index for some tables. I always do inserts in batches of 100.000 rows, and after each batch I manually merge the b-trees using: INSERT INTO table(table) VALUES('optimize'); Is there a possibility that it will do automatic maintenance half-way during a batch? Or will it always wait untill the transaction is finished? It does it when it does it, in fact you're probably getting some small merges during this process already. If you're doing your batch inserts within a surrounding transaction, and are inserting documents by ascending docid (or letting the system choose docid), it can buffer up many updates in memory before flushing them to disk, which is pretty efficient. Inserting 100,000 documents this way will probably not hit any very large merges, unless your documents tend to have a very large number of unique terms. -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Fabian fabianpi...@gmail.com: I always do inserts in batches of 100.000 rows, and after each batch I manually merge the b-trees using: INSERT INTO table(table) VALUES('optimize'); Is there a possibility that it will do automatic maintenance half-way during a batch? Or will it always wait untill the transaction is finished? I think you are victim of the premature optimization :) See documentation: several different b-trees that are incrementally merged as rows are inserted, updated and deleted. This technique improves performance when writing to an FTS table, but causes some overhead for full-text queries that use the index. So you can work with a big FTS tables without using the optimize method. I use some FTS tables with tens of millions records and effect of the optimize isn't measurable. -- Best regards, Alexey Pechnikov. http://pechnikov.tel/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS management
Hello. I've always got great help in this list so I thank in advance who posts here and who will answer to my question. I've started to use FTS in a web site for search thru a table of sites. Given a main table containing, among others, fields id, url, nome (title) and descrizione (description), I keep the fts table updated in this way: CREATE TRIGGER upd_sito AFTER UPDATE ON siti BEGIN UPDATE fts3_siti SET nome=new.nome, url=new.url, descrizione=new.descrizione WHERE docid=new.id; END; CREATE TRIGGER del_sito AFTER DELETE ON siti BEGIN DELETE FROM fts3_siti WHERE docid=old.id; END; CREATE TRIGGER ins_sito AFTER INSERT ON siti BEGIN INSERT INTO fts3_siti(docid,nome,url,descrizione) VALUES(new.id,new.nome,new.url,new.descrizione); END; The FTS virtual table is defined as: CREATE VIRTUAL TABLE fts3_siti USING fts3 ( nome TEXT, url TEXT, descrizione TEXT); I use fts3 since my host doesn't yet offer PHP 5.3.8 which contains a SQLite version that includes FTS4. Before extending this feature to other sections of the site I need to know if this is an efficient way to keep the FTS table updated. Also, is there any pragma or so to tell FTS to automatically search for parts of words, instead of requiring users to add *? As said, thanks for any help and sorry for my bad english. -- Saluti da Gabriele Favrin http://www.favrin.net ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: prefix wildcards?
Hey guys, as far as I understand the documentation FTS3/4 does not support prefix wildcards when searching (e.g. *board = skateboard, longboard, snowboard). Is there any way to get this working by now? I read that the right tokenizer may help. Are there any open source ones out there? Regards, Sebastian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS sqlite3_last_insert_rowid
Please try the latest code checkin ( http://www.sqlite.org/src/info/e569f18b98) and let me know if it works any better for you. Thanks. I've already adjusted the code to manually assign keys, but I'll try to get back to checking it. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS sqlite3_last_insert_rowid
I gather sqlite3_last_insert_rowid doesn't play well with FTS? I don't see an exception noted in the docs but neither are there non-manually managed examples. I'd prefer not to manually mange them but ... ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS sqlite3_last_insert_rowid
Hi, FTS and sqlite3_last_insert_rowid do not work together. This is a known shortcoming. Basically this also means that you can't use any triggers involving FTS. Regards, Hartwig Am 13.05.2011 um 17:38 schrieb Steven Parkes: I gather sqlite3_last_insert_rowid doesn't play well with FTS? I don't see an exception noted in the docs but neither are there non-manually managed examples. I'd prefer not to manually mange them but ... ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS sqlite3_last_insert_rowid
On Fri, May 13, 2011 at 11:38 AM, Steven Parkes smpar...@smparkes.netwrote: I gather sqlite3_last_insert_rowid doesn't play well with FTS? I don't see an exception noted in the docs but neither are there non-manually managed examples. Please try the latest code checkin ( http://www.sqlite.org/src/info/e569f18b98) and let me know if it works any better for you. I'd prefer not to manually mange them but ... ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] fts virtual table questions
I didn't get an answer to my earlier question on fts. I guess it's difficult area. Should an fts virtual table always be re-created from scratch when the database is opened by the application before you start using fts commands? I see that the fts tables are not deleted when the database is closed. I have not found enough info on sqlite virtual tables generally. Do you find it is possible to incrementally add and delete rows in the fts virtual table (insert rows) as the application runs? As new data is added I would like to update the fts. If I delete rows in a table that was originally the source of data for fts virtual table creation, the fts virtual table retains all the expired data. In this case do you drop the fts table and recreate it, or try to delete rows in the fts table? I would gladly get all this info from docs and not bother you, but the docs on sqlite fts don't have much practical everyday usage information. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] fts virtual table questions
On Thu, May 12, 2011 at 4:07 PM, Paul Shaffer sqli...@cyberplasm.comwrote: I didn't get an answer to my earlier question on fts. I guess it's difficult area. Should an fts virtual table always be re-created from scratch when the database is opened by the application before you start using fts commands? No. FTS tables persist just like any other table. I see that the fts tables are not deleted when the database is closed. I have not found enough info on sqlite virtual tables generally. Do you find it is possible to incrementally add and delete rows in the fts virtual table (insert rows) as the application runs? Yes. That works fine for most users. As new data is added I would like to update the fts. If I delete rows in a table that was originally the source of data for fts virtual table creation, the fts virtual table retains all the expired data. In this case do you drop the fts table and recreate it, or try to delete rows in the fts table? Just delete the rows in the FTS table. I would gladly get all this info from docs and not bother you, but the docs on sqlite fts don't have much practical everyday usage information. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS snippet()
Drake, if I do this, I get: SQL logic error or missing database. Thanks Gert ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS snippet()
Hi all, I'm sure I'm doing something stupid here... CREATE VIRTUAL TABLE example USING fts4(TOKEN, CONTEXT); INSERT INTO example(TOKEN, CONTEXT) VALUES('one', 'This is just one sentence.'); INSERT INTO example(TOKEN, CONTEXT) VALUES('two', 'This is just one sentence. Sorry, it are two sentences.'); INSERT INTO example(TOKEN, CONTEXT) VALUES('three', 'More then three words in one sentence.'); SELECT snippet(example, '[', ']') FROM example WHERE CONTEXT MATCH (SELECT TOKEN FROM example); this returns This is just [one] sentence. This is just [one] sentence. Sorry, it are two sentences. More then three words in [one] sentence. while I was hoping for This is just [one] sentence. This is just one sentence. Sorry, it are [two] sentences. More then [three] words in one sentence. Can anyone tell me what I'm doing wrong? thanks gert ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS snippet()
Quoth Gert Van Assche ger...@gmail.com, on 2011-04-13 22:35:49 +0200: SELECT snippet(example, '[', ']') FROM example WHERE CONTEXT MATCH (SELECT TOKEN FROM example); You're asking to match a single independently arbitrarily chosen token from anywhere in the table (which is not even the same as matching at least one token from the table), not whether it matches the one from the same row. Can you do WHERE CONTEXT MATCH TOKEN instead? I think you still need a full table scan for that, but it should return the right results unless FTS4 has some relevant restriction on the RHS of a MATCH. --- Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users