Re: [sqlite] FTS - IF NOT EXISTS missing from sqlite_master

2018-08-21 Thread Richard Hipp
On 8/19/18, Daniel Santiago  wrote:
> Hi,
>
> It seems that when creating a virtual table the 'IF NOT EXISTS' part is
> missing from the sql column in sqlite_master for the recently created table
> entry.
>
> For example:
> sqlite> CREATE VIRTUAL TABLE IF NOT EXISTS `Mail` USING FTS4(`subject`,
> `body`);
> sqlite> SELECT `sql` FROM `sqlite_master` WHERE `name` = 'Mail';

Using a grave accent to quote a column or table name is a mysql-ism.
It only works in MySQL (and also in SQLite because SQLite tries to be
compatible with everybody). The SQL standard notation is double-quote:

SELECT "sql" FROM "sqlite_master" WHERE "name" = 'Mail';


> CREATE VIRTUAL TABLE `Mail` USING FTS4(`subject`, `body`)
>
> Is this intended behaviour?

Yes, it works as intended.  The IF NOT EXISTS has already been
processed and found to be true before the text is inserted into the
sqlite_master table, so why keep it around?
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS - IF NOT EXISTS missing from sqlite_master

2018-08-21 Thread Daniel Santiago
Hi,

It seems that when creating a virtual table the 'IF NOT EXISTS' part is
missing from the sql column in sqlite_master for the recently created table
entry.

For example:
sqlite> CREATE VIRTUAL TABLE IF NOT EXISTS `Mail` USING FTS4(`subject`,
`body`);
sqlite> SELECT `sql` FROM `sqlite_master` WHERE `name` = 'Mail';
CREATE VIRTUAL TABLE `Mail` USING FTS4(`subject`, `body`)

Is this intended behaviour?

Thanks,
-Dany
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS offsets() reporting SQLITE_CORRUPT with empty string, redux

2016-05-03 Thread Michael Gratton

Hi all,

I'm running into a bug where the FTS offsets() function is returning a 
null string, causing SQLITE_CORRUPT to be reported for SELECT query 
results in both SQLite 3.11 and 3.12.2.

This is similar to a previously reported issue with SQLite from a while 
back -  
,
 
however the minimal example from that report does not trigger any error 
in 3.11/3.12.2.

Unfortunately, I haven't been able to produce a minimal example for 
this new instance of the bug. Taking problematic rows from the existing 
DB, inserting them into a new FTS4 table, then running the same query 
is not triggering it. I can however reproduce it with a freshly 
constructed copy of the application's database, so it may not be an 
actual corrupt database. I cannot post the database file to a public 
forum however, since it contains confidential information.

A minimal query on the database in question that produces the error is 
simply of the form:

 > SELECT offsets(FTS4Table) FROM FTS4Table WHERE FTS4Table MATCH 
'somestr';

I can say that it is the offsets() function that is causing it - 
removing/replacing it in the SELECT statement makes the error go away. 
The form of the MATCH string doesn't seem to matter, but only some 
strings will cause the error to appear. Also, a custom tokeniser is in 
use - unicodesn from here: 


For context, the application's bug report is here: 


What can I do to help resolve this bug?

//Mike

-- 
? Michael Gratton, Percept Wrangler.
? 




[sqlite] FTS tokenize=unicode61: "full" or "simple" case folding?

2016-03-21 Thread Tomash Brechko
Hello,

https://www.sqlite.org/fts3.html#tokenizer page says that unicode61
tokenizer implements _full_ case folding (it doesn't emphasize the word,
but it's there).  ftp://unicode.org/Public/6.1.0/ucd/CaseFolding.txt has
the following rules:

-- cut --
...
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
...
1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
...
-- cut --

I.e. in _full_ case folding both "?" (U+1E9E) and "?" (U+00DF) are mapped
to "ss", whereas in _simple_ case folding first one is mapped to the
second.  SQLite 3.11.0 works according to simple rules:

-- cut --
CREATE VIRTUAL TABLE t USING fts3tokenize(unicode61);
SELECT token FROM t WHERE input = "? ?";
-- cut --
gives
-- cut--
?
?
-- cut--

So which one is correct, documentation or implementation?  I also wonder
what a native German speaker would expect in full-text search case?
(Google gives different result counts for "Schlo?" and "Schloss", which
actually surprises me a bit).

-- 
  Tomash Brechko


[sqlite] FTS tokenize=unicode61: "full" or "simple" case folding?

2016-03-21 Thread Matthias-Christian Ott
On 2016-03-21 20:43, Tomash Brechko wrote:
> Hello,
> 
> https://www.sqlite.org/fts3.html#tokenizer page says that unicode61
> tokenizer implements _full_ case folding (it doesn't emphasize the word,
> but it's there).  ftp://unicode.org/Public/6.1.0/ucd/CaseFolding.txt has
> the following rules:
> 
> -- cut --
> ...
> 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
> ...
> 1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
> 1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
> ...
> -- cut --
> 
> I.e. in _full_ case folding both "?" (U+1E9E) and "?" (U+00DF) are mapped
> to "ss", whereas in _simple_ case folding first one is mapped to the
> second.  SQLite 3.11.0 works according to simple rules:
> 
> -- cut --
> CREATE VIRTUAL TABLE t USING fts3tokenize(unicode61);
> SELECT token FROM t WHERE input = "? ?";
> -- cut --
> gives
> -- cut--
> ?
> ?
> -- cut--
> 
> So which one is correct, documentation or implementation?  I also wonder
> what a native German speaker would expect in full-text search case?
> (Google gives different result counts for "Schlo?" and "Schloss", which
> actually surprises me a bit).

The character "?" was often not present in fonts, is not included in
ISO/IEC 8859-1:1998 and is not historically and commonly used in German
(the German Wikipedia and the articles' references can explain this
better than I can). It was just "recently" added Unicode 5.1 in 2008. It
is common to either capitalize ? as SS or SZ (to avoid ambiguities) in
all-caps titles. I think it's uncertain whether ? will be widely used.

If I understand Unicode case folding correctly, it exists to be able to
compare Unicode strings case-insensitively by converting them into a
canonical form. So simple case seems correct, as ? would be folded to ?.
However, if you keep in mind the old orthography (before 1996) and want
to know what makes sense for a search engine, full case folding makes
more sense. As you noted "Schlo?" and "Schloss" should return the same
results for non-verbatim searches as such distinction would seemingly
only be relevant to linguists or historians but not for every day use
and business information systems.

The Unicode standard is unfortunately vague about what it wants to
achieve by case folding and what thoughts went into the case folding
table. Perhaps you should ask on the Unicode mailing list.

You didn't describe your use-case but I would also generally advice to
use a phonetic algorithm for German to canonicalize words for
non-verbatim searches instead of case folding. It gives better results
and most German speakers I know appreciate the phonetic corrections of
popular Internet search engines for non-verbatim searches.

I hope this helps. Maybe it also helps to consult a linguist for
building a non-simplistic search engine for German. For example, you
have to perform compound splitting, stemming and some form of
grammatical analysis at some point.

- Matthias-Christian


[sqlite] FTS tokenize=unicode61: "full" or "simple" case folding?

2016-03-21 Thread Richard Hipp
On 3/21/16, Tomash Brechko  wrote:
> Hello,
>
> https://www.sqlite.org/fts3.html#tokenizer page says that unicode61
> tokenizer implements _full_ case folding (it doesn't emphasize the word,
> but it's there).

That is a documentation error.  It has now been fixed.  Thanks.

Probably the error originates with our thinking of "full" in the sense
of "more than just ASCII", rather than the official unicode definition
of "full" which is "it can possibly change the number of code points".

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] FTS design document

2016-02-22 Thread Dan Kennedy
On 02/21/2016 03:54 PM, Charles Leifer wrote:
> Is anyone aware of a design doc for any of the FTS implementations? Looking
> for something a bit more technical than the docs. If not, where in the
> source would you recommend starting? Thanks!

There is no such document unfortunately.

The FTS5 code is cleaner than FTS3/4. I guess to decode it, start by 
understanding SQLite's virtual table interface, then look at fts5Int.h 
to figure out how the code fits together, then the on-disk format 
documented near the top of fts5_index.c.

Dan.



> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS design document

2016-02-22 Thread Charles Leifer
Perfect, thank you!
On Feb 22, 2016 1:55 AM, "Dan Kennedy"  wrote:

> On 02/21/2016 03:54 PM, Charles Leifer wrote:
>
>> Is anyone aware of a design doc for any of the FTS implementations?
>> Looking
>> for something a bit more technical than the docs. If not, where in the
>> source would you recommend starting? Thanks!
>>
>
> There is no such document unfortunately.
>
> The FTS5 code is cleaner than FTS3/4. I guess to decode it, start by
> understanding SQLite's virtual table interface, then look at fts5Int.h to
> figure out how the code fits together, then the on-disk format documented
> near the top of fts5_index.c.
>
> Dan.
>
>
>
> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] FTS design document

2016-02-21 Thread Charles Leifer
Is anyone aware of a design doc for any of the FTS implementations? Looking
for something a bit more technical than the docs. If not, where in the
source would you recommend starting? Thanks!


[sqlite] FTS: Escaping MATCH expressions

2015-09-09 Thread Dan Kennedy
On 09/09/2015 07:56 PM, Lohmann, Niels, Dr. (CQTN) wrote:
> Hi there,
>   
> I have a question regarding the expressions that are valid after MATCH: Is 
> there a way to escape a string str such that I can safely bind it to variable 
> @var in a statement like "SELECT * FROM myFtsTable WHERE myFtsTable MATCH 
> @var;"?
>   
> In particular, I encountered error messages with strings like "TEST.*" or 
> "TEST'*".

I don't think there is a foolproof way to do that with FTS4. Enclosing 
the text in double-quotes might help, but then there is no way to escape 
embedded double quotes.

In FTS5 you can enclose tokens in double quotes and escape embeded quote 
characters in the usual SQL way (by doubling them). i.e.

   ... MATCH '"TEST.*"'

Or for {TEST"*}:

   ... MATCH '"TEST""*"'

Dan.




>   
> All the best
> Niels
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] FTS: Escaping MATCH expressions

2015-09-09 Thread Martin Kucej
FTS3/4 replaces non-alphanumeric characters with spaces. I do the same
for strings that I match in my applications. Something like:

preg_replace('/[^\*\da-z\x{0080}-\x{}]/ui', ' ', $string);

On Wed, Sep 9, 2015 at 7:56 AM, Lohmann, Niels, Dr. (CQTN)
 wrote:
> Hi there,
>
> I have a question regarding the expressions that are valid after MATCH: Is 
> there a way to escape a string str such that I can safely bind it to variable 
> @var in a statement like "SELECT * FROM myFtsTable WHERE myFtsTable MATCH 
> @var;"?
>
> In particular, I encountered error messages with strings like "TEST.*" or 
> "TEST'*".
>
> All the best
> Niels
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Escaping MATCH expressions

2015-09-09 Thread Lohmann, Niels, Dr. (CQTN)
Hi there,
?
I have a question regarding the expressions that are valid after MATCH: Is 
there a way to escape a string str such that I can safely bind it to variable 
@var in a statement like "SELECT * FROM myFtsTable WHERE myFtsTable MATCH 
@var;"?
?
In particular, I encountered error messages with strings like "TEST.*" or 
"TEST'*".
?
All the best
Niels


[sqlite] FTS and min token length

2015-04-30 Thread Pol-Online
Hi,

It seems that the built-in tokenizers (or at least the unicode61 one) has no 
lower-limit regarding the number of characters in a token. For instance looking 
for records containing `t` will return the ones with sentences containing 
?don?t?.

Does this mean FTS is indexing all the ?I? and ?a? in English sentences as well 
as all single digit occurrences, or is there some higher level exclusion 
heuristic?

Is there any way to configure the tokenize to ignore token less than 2 
characters?  


-Pol


Pol-Online
info at pol-online.net (mailto:info at pol-online.net)



Re: [sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread boscowitch
Yeah, -ID 4 was just a desperate experiment for a hack with longer data
in the search to see if it would lead the snippet function to start
grabbing the data from the start (or at least one "word"/char more).

The offsets beeing wrong and therefore the ... was kinda expected of
me, but in case it worked I would have manually substracted the offset
and put the markers in myself... so it wasn't part of the bug report
just a test how it behaves in that case.

Good to know for the future that its already fixed, thx for taking care
of it so fast!

boscowitch

Am Mittwoch, den 28.01.2015, 02:04 +0700 schrieb Dan Kennedy:
> On 01/27/2015 06:48 PM, boscowitch wrote:
> >
> >
> > and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
> > I get following for a select with snippet:
> >
> > EXAMPLE OUTPUT:
> > sqlite> select docid,*,snippet(test) from test where german match "a";
> > 1|[1] a b c|1] a b c
> > 2|[{[_.,:;[1] a b c|1] a b c
> > 3|1[1] a b c|1[1] a b c
> > 4|[1] a b c|1] a b c
> > 5|​[1] a b c|​[1] a b c
> >
> >
> >
> > -As you can see for id 1 and 2  is at the right position
> > but all beginning non-alphanumerical [,{, etc. are just left out in the
> > snippet.
> >
> >
> > -ID 4 does not help and breaks the offsets so even worse
> >
> 
> Thanks for reporting this. The issue with (1) and (2) is now fixed here:
> 
>http://www.sqlite.org/src/info/adc9283dd9b
> 
> I think it is a bug in the input data causing the problem in (4). The 
> values inserted into "test" and "testdata" are just slightly different.
> 
> Dan.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread Dan Kennedy

On 01/27/2015 06:48 PM, boscowitch wrote:



and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
I get following for a select with snippet:

EXAMPLE OUTPUT:
sqlite> select docid,*,snippet(test) from test where german match "a";
1|[1] a b c|1] a b c
2|[{[_.,:;[1] a b c|1] a b c
3|1[1] a b c|1[1] a b c
4|[1] a b c|1] a b c
5|​[1] a b c|​[1] a b c



-As you can see for id 1 and 2  is at the right position
but all beginning non-alphanumerical [,{, etc. are just left out in the
snippet.


-ID 4 does not help and breaks the offsets so even worse



Thanks for reporting this. The issue with (1) and (2) is now fixed here:

  http://www.sqlite.org/src/info/adc9283dd9b

I think it is a bug in the input data causing the problem in (4). The 
values inserted into "test" and "testdata" are just slightly different.


Dan.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] small sqlite fts snippet function or Fts Bug!

2015-01-27 Thread boscowitch
Hello since it this bug report (+ a dirty-fix) it might be useful for
both users and devs.
that's why I send a copy to both mailing lists! 
I hope I don't bother the diligent devs who read all of both list, sry
to them, and thx for sqlite btw. ;)!

recently I wanted to use the snippet function in sqlite for my small
sqlite dictionary (running on android but the bug occurs also on my
linux desktop).

but it behaved strangely when my entry started with "non-words"
character(s) (not alphanumeric and all Unicode (or chars>128) in short
simple tokenizer delimiters)

the snippet never prints them if they are in the beginning  of the first
word
here an examples to demonstrate:

EXAMPLE SETUP SQL:
create table testdata (german);
create virtual table test using fts4(content="testdata",german);

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(1,"[1] a b c ");

insert into testdata(german) VALUES ("[{[_.,:;[1] a b c");
insert into test(docid,german) VALUES(2,"[{[_.,:;[1] a b c "); 

insert into testdata(german) VALUES ("1[1] a b c");
insert into test(docid,german) VALUES(3,"1[1] a b c "); 

insert into testdata(german) VALUES ("[1] a b c");
insert into test(docid,german) VALUES(4,"1[1] a b c "); 

insert into testdata(german) 
VALUES(char(8203,91,49,93,32,97,32,98,32,99));
insert into test(docid,german)
VALUES(5,char(8203,91,49,93,32,97,32,98,32,99));


and the in an sqlite shell (SQLite version 3.8.8.1 2015-01-20 16:51:25)
I get following for a select with snippet:

EXAMPLE OUTPUT:
sqlite> select docid,*,snippet(test) from test where german match "a";
1|[1] a b c|1] a b c
2|[{[_.,:;[1] a b c|1] a b c
3|1[1] a b c|1[1] a b c
4|[1] a b c|1] a b c
5|​[1] a b c|​[1] a b c



-As you can see for id 1 and 2  is at the right position
but all beginning non-alphanumerical [,{, etc. are just left out in the
snippet.

-ID 3 works but has an additional 1 that should not be there so no
solution...

-ID 4 does not help and breaks the offsets so even worse

-ID 5 works BUT this is a dirty fix i found.
it adds an Unicode character ('ZERO WIDTH SPACE' (U+200B)) in front
which obviously cant be seen and doesnt "break" the offsets (just shifts
them all +1)
I didn't test it yet on android but I hope so, since it supports
Unicode ... 
obviously this is not a nice solution or one for more simpler/embedded
systems.


(btw. the same bug occurs also with fts3 and also with no special
content option)
here a small example for normal fts4 with a more custom snippet call:

create virtual table test using fts4(german);
insert into test VALUES("[1] a b c");

sqlite> select *,snippet(test,"#","#","...",0,64) from test where german
match 'a';
[1] a b c|1] #a# b c



regards boscowitch

PS: please excuse the "german" ;) and all English spelling errors 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-27 Thread supermariobros
Just to be clear. It basically means that after MATCH  records are returned
it iterates through ALL the rowids of the returned set and removes them from
the set and orders them accordingly.



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78849.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-26 Thread Clemens Ladisch
supermariobros wrote:
> Well, they all give exactly the same output.
>
> sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE 
> activity_text_content MATCH 'x' ORDER BY rowid ASC LIMIT 100;
> 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows)
>
> sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE 
> activity_text_content MATCH 'x' AND rowid>1000 ORDER BY rowid ASC LIMIT 10;
> 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows)
>
> sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE 
> activity_text_content MATCH 'x' LIMIT 100;
> 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows)
>
> It almost looks like EXPLAIN ignores the second part where rowid is compared
> or sorted.

It doesn't change the way in which the database accesses the table, which
implies that the rowid comparison is not using any index, i.e., the FTS
module first computes the results, and then the rows with small rowid
values are thrown away.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-25 Thread supermariobros
Well, they all give exactly the same output.

sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE
activity_text_content MATCH 'x' ORDER BY rowid ASC LIMIT 100; 
0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows)

sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE
activity_text_content MATCH 'x' AND rowid>1000 ORDER BY rowid ASC LIMIT 10; 
0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4:ASC (~0 rows)

sqlite> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE
activity_text_content MATCH 'x' LIMIT 100; 0|0|0|SCAN TABLE
activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows)


It almost looks like EXPLAIN ignores the second part where rowid is compared
or sorted.
how should I understand that?




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78831.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-24 Thread Clemens Ladisch
supermariobros wrote:
> EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE
> activity_text_content MATCH 'x' AND rowid>1000 ORDER BY rowid ASC LIMIT 10;
> 0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows)
>
> If I understand it correctly it uses indexes properly on FTS but I do not
> know how the row id scanning and sorting is done.

Compare the EXPLAIN QUERY PLAN output of this query and of the same
query without the rowid comparison.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-24 Thread supermariobros
Hi 
Thanks For quick response

Of course you are right that I can not use row id in the way I used it
above. I guess I wrote it quicker than I thought about it.
However If I use original rowid and LIMIT it should be fine, knowing that
the submited rowid is the rowid of the last element of the previous set.
Like this:
SELECT rowid  FROM text_content WHERE  text_content MATCH 'x' AND rowid
>1000 ORDER BY rowid LIMIT 10;
of course it only works when last rowid of the previous set is available so
I can not just skip to the N-th page, wchich is fine in my case.  I just
have to know first and last element to go back and forward;

This is what explain query shows:

EXPLAIN QUERY PLAN SELECT rowid  FROM activity_text_content WHERE
activity_text_content MATCH 'x' AND rowid>1000 ORDER BY rowid ASC LIMIT 10;
0|0|0|SCAN TABLE activity_text_content VIRTUAL TABLE INDEX 4: (~0 rows)  

If I understand it correctly it uses indexes properly on FTS but I do not
know how the row id scanning and sorting is done.
If you have some time could you shed some light on this.

Also when I compare time of the querry of the one with rowid > to the one
with OFFSET query on 5000 records I do not see any difference in execution
time. I read just like you wrote that OFFSET starts to be time consuming
when number is very high but what is the high number case. I doubt I will
ever deal with more than 1 records in this entire virtual table so
should I even be concerned?

Thanks






--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78772.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-23 Thread Clemens Ladisch
supermariobros wrote:
> If I am using FTQ that looks like this
> SELECT * FROM mail WHERE body MATCH 'sqlite'
> can I add to it "WHERE rowid > 5 AND rwoid <10"
> or it will significantly slow it down.

How much did it slow down when you tested it?

Anyway,
without index:
  sqlite> EXPLAIN QUERY PLAN SELECT * FROM t;
  0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 0:
with word search:
  sqlite> EXPLAIN QUERY PLAN SELECT * FROM t WHERE t MATCH 'x';
  0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 3:
your query:
  sqlite> EXPLAIN QUERY PLAN SELECT * FROM t WHERE t MATCH 'x' AND rowid 
BETWEEN 5 AND 10;
  0|0|0|SCAN TABLE t VIRTUAL TABLE INDEX 393219:

So it uses some index (and the lowest two bits are still set, so
it still uses the FTS index, but this is an implementation detail.)

This might be different with a different SQLite version.


> If so what would be the best approach for pagination

The rowid cannot be used for pagination because you get the
numbers of the original rows.

You would have to use OFFSET/LIMIT, which is inefficient for
large offsets.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS pagination

2014-10-23 Thread supermariobros
Quick question. If I am using FTQ that looks like this "SELECT * FROM mail
WHERE body MATCH 'sqlite' " can I add to it "WHERE rowid > 5 AND rwoid <10"
or it will significantly slow it down.  If so what would be the best
approach for pagination, For example if I get 500 rows with the matching
term and obviously I do not need them all right away. Should I just use the
first option and maybe select only rowids and then go through them? 

Any suggestions?


Thanks



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS pagination

2014-10-23 Thread supermariobros
Or maybe, if I am using android, it should be done at the cursor level?



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/FTS-pagination-tp78754p78755.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Milan Kříž


Dne 28.8.2014 16:48, Clemens Ladisch napsal(a):

Then try with 3.8.6.


Ouuu . .sorry again.
I have tested it with 3.8.6 and the query plan looks ok now.
SCAN TABLE nameftsTable VIRTUAL TABLE INDEX 18:
EXECUTE LIST SUBQUERY 1

But I also tested it with my version again and I'm getting the same (correct) 
query plan now.
I cannot understand how it is possible :-).
The only thing I changed is that in the meantime I've recreated my ftsTable.
So it was either in some strange state I cannot reproduce now or I am simply 
overworked :-).
Or maybe I accidentally used '=' or 'is' instead of 'match'.

However, many thanks for your support, now it all seems much more clear.

Milan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Clemens Ladisch
Milan Kříž wrote:
> Dne 28.8.2014 14:54, Clemens Ladisch napsal(a):
>> Milan Kříž wrote:
>>> So does it mean that the full-text search is not performed for the 
>>> following query at all?
>>
>> No, it means that you are using a different version.
>
> But I still cannot understand that query plan.

Then try with 3.8.6.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Milan Kříž

Dne 28.8.2014 14:54, Clemens Ladisch napsal(a):

Milan Kříž wrote:

So does it mean that the full-text search is not performed for the following 
query at all?

SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 
23)

 SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
 EXECUTE LIST SUBQUERY 1

No, it means that you are using a different version.

(In any case, it is not possible to execute MATCH without the FTS index.)


Thanks and sorry for bothering you with such details. But I still cannot 
understand that query plan.
Since for a simple rowid query it says:
explain query plan
select * from ftsTable where docId = 100
> SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
It is clear to me that index 1 is definitely a rowId index.

Then for complex query above, it says:

SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
EXECUTE LIST SUBQUERY 1

So where is a mention about using an FTS index (full-text query)?
Or is it just an imperfection of 'explain query plan' that the usage of the full-text index is missing from the 
query plan?


I use SQLite version 3.8.5 and for full table full-text search it gives a correct index according to comment in 
the fts3Int.h:


FTS3_FULLTEXT_SEARCH+${NUMBER_OF_COLUMNS} => VIRTUAL TABLE INDEX 18 in my 
ftsTable

Regards,
Milan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Clemens Ladisch
Milan Kříž wrote:
> So does it mean that the full-text search is not performed for the following 
> query at all?
>
> SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 
> 23)
>
> SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
> EXECUTE LIST SUBQUERY 1

No, it means that you are using a different version.

(In any case, it is not possible to execute MATCH without the FTS index.)

> what does the 'EXECUTE LIST SUBQUERY 1' mean?

It's how the "rowid IN (...)" is implemented.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Milan Kříž

Clemens Ladisch wrote:

Milan Kříž wrote:

Clemens Ladisch wrote:

"INDEX 1" is the full-text search.

Sorry, that's wrong.


So does it mean that the full-text search is not performed for the following 
query at all?
And that only the docId index is used to get entries in the IN sub-clause
and then a linear scan with a comparison to 'a*' is done?

query:
SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 22, 
23)
query plan:
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
EXECUTE LIST SUBQUERY 1

Is it possible to force SQLite to use the full-text search instead of the rowid 
search?
I can think only about something like this:
select docId from (
select docId from ftsTable where ftsTable match 'a*'
) where docId in (21, 22, 23, 24)
query plan:
SCAN TABLE nameFtsTable VIRTUAL TABLE INDEX 18:
EXECUTE LIST SUBQUERY 1
It looks much better.  But what does the 'EXECUTE LIST SUBQUERY 1' mean? You wrote that returned values are 
compared by SQLite (outside of FTS).
But does it use some index (rowid index) or is it impossible for SQLite to use an index on the same table (even 
if the first one - full-text index - was used in a subquery)?


Of course, in my real use-case I have much more complex docId condition and I have a lot of entries in an FTS 
table (about million entries),
so I would like the full-text search to prune the results first and then filter results using docId. FTS 
condition is also much more complex than just 'a*' so I expect that

many results will be filtered out by the full-text query.

Thanks,
Milan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Clemens Ladisch
Milan Kříž wrote:
> Clemens Ladisch wrote:
>> "INDEX 1" is the full-text search.

Sorry, that's wrong.

The idxNum value is determined as follows: (see fts3Int.h)

/*
** The Fts3Cursor.eSearch member is always set to one of the following.
** Actualy, Fts3Cursor.eSearch can be greater than or equal to
** FTS3_FULLTEXT_SEARCH.  If so, then Fts3Cursor.eSearch - 2 is the index
** of the column to be searched.  For example, in
**
** CREATE VIRTUAL TABLE ex1 USING fts3(a,b,c,d);
** SELECT docid FROM ex1 WHERE b MATCH 'one two three';
**
** Because the LHS of the MATCH operator is 2nd column "b",
** Fts3Cursor.eSearch will be set to FTS3_FULLTEXT_SEARCH+1.  (+0 for a,
** +1 for b, +2 for c, +3 for d.)  If the LHS of MATCH were "ex1"
** indicating that all columns should be searched,
** then eSearch would be set to FTS3_FULLTEXT_SEARCH+4.
*/
#define FTS3_FULLSCAN_SEARCH 0/* Linear scan of %_content table */
#define FTS3_DOCID_SEARCH1/* Lookup by rowid on %_content table */
#define FTS3_FULLTEXT_SEARCH 2/* Full-text index search */

/*
** The lower 16-bits of the sqlite3_index_info.idxNum value set by
** the xBestIndex() method contains the Fts3Cursor.eSearch value described
** above. The upper 16-bits contain a combination of the following
** bits, used to describe extra constraints on full-text searches.
*/
#define FTS3_HAVE_LANGID0x0001  /* languageid=? */
#define FTS3_HAVE_DOCID_GE  0x0002  /* docid>=? */
#define FTS3_HAVE_DOCID_LE  0x0004  /* docid<=? */


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Milan Kříž

Clemens Ladisch wrote:

4) The I have a query with both 'match ?' sub-clause and 'rowid=?'
sub-clause. It is not clear to me which variant will be used.
But according to definition of Full-text query it should use full-text
query at first. And then? Will it use index to rowid after full-text
query is performed?
 SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 
22, 23)
- anyway from the query plan it seems that no full-text query is
   performed at all - or how to interpret it?:
 SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
 EXECUTE LIST SUBQUERY 1

"INDEX 1" is the full-text search.  The rowid values of the returned
rows are then compared (by SQLite, outside of FTS) against the list.


Ok, it would be what I expect. But according to my first two queries 1) and 2), it looks like a full-text index 
is the *index 18*.



1) A*full-text query*
SELECT docId FROM ftsTableWHERE ftsTable MATCH 'a*'
- gets a following query plan:
  SCAN TABLE ftsTable VIRTUAL TABLE*INDEX**18*:

2) A*query by rowid*
  SELECT docId FROM ftsTable WHERE docid = 10
- gets a following query plan:
  SCAN TABLE ftsTable VIRTUAL TABLE*INDEX 1*:


So maybe the query plan shows a wrong number? Or is the index number unrelated 
to a full-text index?
I thought that INDEX 1 is an index to rowid, but maybe index numbers are 
somehow fuzzy? :-).

Thanks,
Milan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS full-text query vs. query by rowid

2014-08-28 Thread Clemens Ladisch
Milan Kříž wrote:
> 3) A query which should use a linear scan according to the SQLite
>documentation (http://www.sqlite.org/fts3.html#section_1_4)
> SELECT docId FROM ftsTable WHERE docId BETWEEN 20 AND 23
> - gets a following query plan:
> SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393216:
> - the documentation does not say a word about another indexes on an
>   FTS table, so where is the index 393216 come from?

This is an undocumented optimization.  In recent versions, FTS also
optimizes docid searches with less/greater than operators.

(The index number is an implementation detail.)

> 4) The I have a query with both 'match ?' sub-clause and 'rowid=?'
>sub-clause. It is not clear to me which variant will be used.
> But according to definition of Full-text query it should use full-text
> query at first. And then? Will it use index to rowid after full-text
> query is performed?
> SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 
> 22, 23)
> - anyway from the query plan it seems that no full-text query is
>   performed at all - or how to interpret it?:
> SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
> EXECUTE LIST SUBQUERY 1

"INDEX 1" is the full-text search.  The rowid values of the returned
rows are then compared (by SQLite, outside of FTS) against the list.

> Could you please give me a clue how to guess whether a complex FTS
> query will use a full-text index and which one it will use?

There is only one full-text index per table.

The FTS module implements a search/lookup iff the EXPLAIN QUERY PLAN
output shows "VIRTUAL TABLE INDEX".


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS full-text query vs. query by rowid

2014-08-27 Thread Milan Kříž

Hello,
I would like to ask several questions regarding to SQLite FTS module and how it 
uses indexes.

I have following queries:

1) A full-text query
SELECT docId FROM ftsTableWHERE ftsTable MATCH 'a*'
- gets a following query plan:
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 18:

2) A query by rowid
SELECT docId FROM ftsTable WHERE docid = 10
- gets a following query plan:
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:

So far this is as expected.

3) A query which should use a linear scan according to the SQLite documentation 
(http://www.sqlite.org/fts3.html#section_1_4)

SELECT docId FROM ftsTable WHERE docId BETWEEN 20 AND 23
- gets a following query plan:
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393216:
- the documentation does not say a word about another indexes on an FTS table, so where is the index 393216 come 
from?


4) The I have a query with both 'match ?' sub-clause and 'rowid=?' sub-clause. It is not clear to me which 
variant will be used.
But according to definition of Full-text query it should use full-text query at first. And then? Will it use 
index to rowid after full-text query is performed?

SELECT docId FROM ftsTable WHERE ftsTable MATCH 'a*' AND rowId IN (20,21, 
22, 23)
- anyway from the query plan it seems that no full-text query is performed at 
all - or how to interpret it?:
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 1:
EXECUTE LIST SUBQUERY 1

5) When between is used instead of in, SQLite should use a linear scan, but 
query plan says something else:
select docId from ftsTable where ftsTable match 'a*' and docId between 20 
and 23
- query plan
SCAN TABLE ftsTable VIRTUAL TABLE INDEX 393234:
- even another index (393234) seems to be used but again, where does it come 
from?

Is there any undocumented optimizations in the FTS module or did I miss some 
note in the SQLite documentation?
Could you please give me a clue how to guess whether a complex FTS query will use a full-text index and which 
one it will use?

And could you explain what the different numbers of indexes means? Are all 
indexes really created in a DB?

Thanks for you explanation,
Milan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS module and DB close

2014-06-08 Thread gwenn
Ok,
Maybe the solution is:
1) try to close the connection: sqlite3_close
2) if error code is SQLITE_BUSY,
 a) use sqlite3_next_stmt to finalize dangling statements
 b) retry to close the connection
Step (1) ensures that FTS related statements are finalized.
Thanks.

On Sat, Jun 7, 2014 at 7:49 PM, gwenn  wrote:
> Hello,
> How do you prevent double free/finalize of statements created by the
> FTS module ?
> I am using sqlite3_next_stmt to finalize all dangling statements
> before closing the connection but the program crashes because the FTS
> module finalizes them too when sqlite3_close is called...
> May be I should use sqlite3_close_v2 (not available on MacOS X: SQLite
> version 3.7.13) ?
>
> Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS module and DB close

2014-06-07 Thread gwenn
Hello,
How do you prevent double free/finalize of statements created by the
FTS module ?
I am using sqlite3_next_stmt to finalize all dangling statements
before closing the connection but the program crashes because the FTS
module finalizes them too when sqlite3_close is called...
May be I should use sqlite3_close_v2 (not available on MacOS X: SQLite
version 3.7.13) ?

Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS performance (mobile devices)

2013-06-20 Thread Jan Slodicka
ry-friendly, but too slow. They can be used for small
changes only.
In case of bulk actions, the triggers should be dropped and the FTS index
rebuilt after the action completes.

2. Rebuilding FTS index takes a lot of memory, whereby the amount does not
depend on the table size.
   Question: Would it be possible to reduce this number? Apparently FTS
index can be built with less memory.

3. Don't use FTS index optimization. Small gains in terms of speed, risk of
high memory consumption.

4. There is some problem in SQLite FTS code as the described crash proves.
(Might relate to empty index only)

5. DB size: This is not described above, but it is important to realize,
that SQLite DB does not shrink.
   FTS index takes compareble space to the data being indexed and it will be
stored first in the WAL log, then in the DB itself.
   In the worst case the DB size may blow up 2-3x.
   Solution for the WAL log: PRAGMA journal_size_limit=10485760 (fast and
easy)
   Solution for the DB: VACUUM (slow, that's way difficult)

Any questions/comments are welcome.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Find Tokens at Record Start

2013-04-05 Thread Fabian Klebert
You might want to check the following:

SELECT word FROM fts WHERE fts MATCH '^token'

Beginning with 3.7.9 this should only return records that have 'token' at the 
beginning of the record.
See changelog of 3.7.9:

If a search token (on the right-hand side of the MATCH operator) in FTS4 begins 
with "^" then that token must be the first in its field of the document. ** 
Potentially Incompatible Change **



-Ursprüngliche Nachricht-
Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
Im Auftrag von Clemens Ladisch
Gesendet: Freitag, 5. April 2013 15:41
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] FTS Find Tokens at Record Start

Paul Vercellotti wrote:
> using FTS, how do you match records that contain certain tokens beginning at 
> the start of the record

Apparently, this information is not stored in the FTS index.

Search for the tokens, then manually check with LIKE or something like that.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Find Tokens at Record Start

2013-04-05 Thread Clemens Ladisch
Paul Vercellotti wrote:
> using FTS, how do you match records that contain certain tokens beginning at 
> the start of the record

Apparently, this information is not stored in the FTS index.

Search for the tokens, then manually check with LIKE or something like that.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS Find Tokens at Record Start

2013-04-04 Thread Paul Vercellotti


Hi there,

I couldn't find this from the documentation: using FTS, how do you match 
records that contain certain tokens beginning at the start of the record (or 
any token position for that matter).

For example, I want to match records that start with "Four score and seven 
years ago" but not match records that contain that phrase in the middle.

This matches any document that contains the phrase:
SELECT rowid FROM documents WHERE content MATCH '"Four score and seven years 
ago"';
But I want only the results that start with that phrase, which would be a 
subset of those results. 

It looks like I could programmatically parse the output of the offsetsfunction 
to find this info and manually filter my results, but is there a way to set up 
the query so it does the filtering for me, and only returns results that start 
at byte offset 0 in the column (or token 0)?


Thanks!
-Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS problem with 'NOT term' queries

2013-02-03 Thread Dan Kennedy

On 02/04/2013 12:18 AM, Alexey Pechnikov wrote:

Hello!




And as result it's impossible to search docs in some situations:
SELECT * FROM docs WHERE docs MATCH 'NOT sqlite';
Error: malformed MATCH expression: [NOT sqlite]



As far as I can tell, in MATCH syntax NOT is a binary operator, denoting
set difference. You are trying to use it as a unary operator.



Well, and how to rewrite query MATCH 'NOT sqlite'? In simple FTS qery
syntax is possible to use "-sqlite" but is the equal construction by
extended syntax?


Not possible. And note that the query "-sqlite" doesn't actually work
either. Always returns an empty set.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS problem with 'NOT term' queries

2013-02-03 Thread Alexey Pechnikov
Hello!

>
>> And as result it's impossible to search docs in some situations:
>> SELECT * FROM docs WHERE docs MATCH 'NOT sqlite';
>> Error: malformed MATCH expression: [NOT sqlite]
>>
>
> As far as I can tell, in MATCH syntax NOT is a binary operator, denoting
> set difference. You are trying to use it as a unary operator.


Well, and how to rewrite query MATCH 'NOT sqlite'? In simple FTS qery
syntax is possible to use "-sqlite" but is the equal construction by
extended syntax?


Best  regards,
Alexey.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS SQLite "optimize" command creates very large BLOBs in *_segment table.

2013-01-31 Thread Dominique Pellé
Hi

I have a database using SQLite-3.7.14 with a FTS4 virtual table (Free
Text Search). The FTS table contains several millions of small documents.

The FTS DB is created on a server (where creating time does not matter)
and then used on an embedded device as a read-only database for FTS
queries (where speed of queries need to be as fast as possible and use
as little memory as possible).

Since the DB is used read-only on the device, I ran the the "optimize"
command (see http://www.sqlite.org/fts3.html#optimize) in the FTS index
after the last INSERT was done on the server.

Unfortunately, I found that the "optimize" command is creating very large
BLOBs in the database since it merges together all of inverted index b-trees
into very large complete b-tree. For frequent terms, merged BLOBs can be
up to ~50MB in my case in the ftsTable_segment table, as a result of
running "optimize":

  sqlite> SELECT *, length(block) AS len_blob
FROM ftsTable_segments
ORDER BY len_blob DESC LIMIT 3;

  seblockid|block|len_blob
  336808||51867353 <-- 51 MB BLOB!
  311724||19375541 <-- 19 MB BLOB
  334719||19223423 <-- 19 MB BLOB

Such large BLOBs (~50MB) are a problem in my case as they consume
a  large amount of memory on a embedded device, when doing FTS
queries with several frequent terms. SQLite memory high watermark
reported by sqlite3_memory_highwater() reaches ~200MB when query
contains several frequent terms, which is too much for an embedded
device, even though I set of soft memory limit of only 3MB using
sqlite3_soft_heap_limit64(...).

As a result, I have disabled running "optimize" on the FTS index
after creating the FTS DB on the server.

However, it got me thinking: it would be nice to be able to pass an
optional parameter to the FTS "optimize" command in order to
avoid merging b-trees when BLOBs reach a certain size?  In other
words, instead of doing...

INSERT INTO ftsTable(ftsTable) VALUES('optimize');

... I would like to be able to do something like this...

INSERT INTO ftsTable(ftsTable) VALUES('optimize=1048576');

... where optimize=1048576 indicates to *partially* optimize
reverse index b-trees in such a way that BLOBs do not exceed
1MB (1048577 bytes) in this example. It's OK if it's a fuzzy soft limit.


1/ Wouldn't such partial optimization of FTS index be useful?

2/ I also suggest that the documentation at
http://www.sqlite.org/fts3.html#optimize
indicates that optimizing an FTS index can create very large BLOBs.
This may be OK on a desktop or server, but it can be a problem
on embedded devices with limited amount of memory.

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS problem with 'NOT term' queries

2013-01-29 Thread Kevin Benson
The explanation right above that table of examples contains these important
(I believe) phrases:

...BINARY SET operators...
...TWO operands to an operator...

(emphasis mine)
On Tue, Jan 29, 2013 at 11:30 PM, Alexey Pechnikov
wrote:

> SELECT * FROM docs WHERE docs MATCH 'NOT




--
   --
  --
 --Ô¿Ô--
K e V i N
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS problem with 'NOT term' queries

2013-01-29 Thread Igor Tandetnik

On 1/29/2013 11:30 PM, Alexey Pechnikov wrote:

From
http://www.sqlite.org/fts3.html#section_3_1
we can see the query

SELECT * FROM docs WHERE docs MATCH 'database NOT sqlite';

But the equal query doesn't works:

SELECT * FROM docs WHERE docs MATCH 'NOT sqlite AND database';
Error: malformed MATCH expression: [NOT sqlite AND database]

And as result it's impossible to search docs in some situations:
SELECT * FROM docs WHERE docs MATCH 'NOT sqlite';
Error: malformed MATCH expression: [NOT sqlite]


As far as I can tell, in MATCH syntax NOT is a binary operator, denoting 
set difference. You are trying to use it as a unary operator.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS problem with 'NOT term' queries

2013-01-29 Thread Alexey Pechnikov
Hello!

From
http://www.sqlite.org/fts3.html#section_3_1
we can see the query

SELECT * FROM docs WHERE docs MATCH 'database NOT sqlite';

But the equal query doesn't works:

SELECT * FROM docs WHERE docs MATCH 'NOT sqlite AND database';
Error: malformed MATCH expression: [NOT sqlite AND database]

And as result it's impossible to search docs in some situations:
SELECT * FROM docs WHERE docs MATCH 'NOT sqlite';
Error: malformed MATCH expression: [NOT sqlite]



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS questions

2013-01-12 Thread Michael Black
Test it yourself:

create virtual test using fts4(context text);
insert into test values ('c:\folders\video\עברית');
select * from test where context match 'עברית';
If you want a partial match add a wildcard
select * from test where context match 'עברית*';

I don't have the codepage running so I can't test it but it sure works for 
English.  Don't see why it wouldn't work for other languages.



-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of moti lahiani
Sent: Saturday, January 12, 2013 10:56 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] FTS questions

Thanks for your reply
Why I care the language: according to the documentation:
"A term is a contiguous sequence of eligible characters, where eligible
characters are all alphanumeric characters and all characters with Unicode
codepoint values greater than or equal to 128. All other characters are
discarded when splitting a document into terms. Their only contribution is
to separate adjacent terms."
if the path include folder or file name in France or Hebrew or Arabic  and
the user what to search that word according to above the FTS will not found
it. Am I wrong?

about the backslash/slash if I have a path like this and the user search
for עברית
did the FTS will find it?
c:\folders\video\עברית

about the creation I was confuse it not temporary its virtual so ignore my
question
Thanks




On Sat, Jan 12, 2013 at 5:58 PM, Michael Black  wrote:

> I'm not sure I understand your problem.
> Why do you care what language it is?  Aren't you just wanting to tokenize
> on
> backslash?
>
> Simple way is to replace all spaces in the path with another char (e.g.
> '_')
> then replace all backslashes with a space.
> Then you can just use the default tokenizer and make the same changes on
> any
> user queries.  So you map the user query to your storage format.
>
> Not sure why you would want to use some stem tokenizer on paths.
>
> And, of course, you're reinventing the wheel unless you have some special
> purpose in mind.
>
> http://locate32.cogit.net/
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani
> Sent: Saturday, January 12, 2013 4:37 AM
> To: sqlite-users@sqlite.org
> Cc: Moti LAHIANI
> Subject: [sqlite] FTS questions
>
> Hello all
>
> I'm new with sqlite3 and sql.
>
> I have data base that include path columns (file system path like c:\bla
> bla\myFiles\1.txt)
>
> On that columns I need to do
>
> 1)  search for patterns in case the user want to find a file or
> directory
>
> 2)  search for prefix path in case the user rename a file or directory
>
> the Database include about 1.5-2.5 million records and to use the "LIKE" is
> not possible because the result time.
>
> As an alternative I want to use FTS3 or FTS4 but I think I have a problems
> with what I read here: http://www.sqlite.org/fts3.html#section_1_4
>
> And here: http://www.sqlite.org/fts3.html#section_6_3
>
> I need to specify the language to FTS to use it as tokenize but the path
> can include multi languages how can I configure the FTS table to use all
> languages
>
> How can I tell to FTS to token the path only according to the character "\"
> ?
>
>
>
> More than that when creating FTS table it creates with the TEMPORARY  key
> word. My question is: do I need to create this table each time I run the
> data base   (because the temporary word) or for each connections (in case
> of multiconnections) or this is a table like all tables I declared and it
> stay in the data base even if I restart my PC
>
>
>
> Thanks a lot
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS questions

2013-01-12 Thread moti lahiani
Thanks for your reply
Why I care the language: according to the documentation:
"A term is a contiguous sequence of eligible characters, where eligible
characters are all alphanumeric characters and all characters with Unicode
codepoint values greater than or equal to 128. All other characters are
discarded when splitting a document into terms. Their only contribution is
to separate adjacent terms."
if the path include folder or file name in France or Hebrew or Arabic  and
the user what to search that word according to above the FTS will not found
it. Am I wrong?

about the backslash/slash if I have a path like this and the user search
for עברית
did the FTS will find it?
c:\folders\video\עברית

about the creation I was confuse it not temporary its virtual so ignore my
question
Thanks




On Sat, Jan 12, 2013 at 5:58 PM, Michael Black  wrote:

> I'm not sure I understand your problem.
> Why do you care what language it is?  Aren't you just wanting to tokenize
> on
> backslash?
>
> Simple way is to replace all spaces in the path with another char (e.g.
> '_')
> then replace all backslashes with a space.
> Then you can just use the default tokenizer and make the same changes on
> any
> user queries.  So you map the user query to your storage format.
>
> Not sure why you would want to use some stem tokenizer on paths.
>
> And, of course, you're reinventing the wheel unless you have some special
> purpose in mind.
>
> http://locate32.cogit.net/
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani
> Sent: Saturday, January 12, 2013 4:37 AM
> To: sqlite-users@sqlite.org
> Cc: Moti LAHIANI
> Subject: [sqlite] FTS questions
>
> Hello all
>
> I'm new with sqlite3 and sql.
>
> I have data base that include path columns (file system path like c:\bla
> bla\myFiles\1.txt)
>
> On that columns I need to do
>
> 1)  search for patterns in case the user want to find a file or
> directory
>
> 2)  search for prefix path in case the user rename a file or directory
>
> the Database include about 1.5-2.5 million records and to use the "LIKE" is
> not possible because the result time.
>
> As an alternative I want to use FTS3 or FTS4 but I think I have a problems
> with what I read here: http://www.sqlite.org/fts3.html#section_1_4
>
> And here: http://www.sqlite.org/fts3.html#section_6_3
>
> I need to specify the language to FTS to use it as tokenize but the path
> can include multi languages how can I configure the FTS table to use all
> languages
>
> How can I tell to FTS to token the path only according to the character "\"
> ?
>
>
>
> More than that when creating FTS table it creates with the TEMPORARY  key
> word. My question is: do I need to create this table each time I run the
> data base   (because the temporary word) or for each connections (in case
> of multiconnections) or this is a table like all tables I declared and it
> stay in the data base even if I restart my PC
>
>
>
> Thanks a lot
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS questions

2013-01-12 Thread Michael Black
I'm not sure I understand your problem.
Why do you care what language it is?  Aren't you just wanting to tokenize on
backslash?

Simple way is to replace all spaces in the path with another char (e.g. '_')
then replace all backslashes with a space.
Then you can just use the default tokenizer and make the same changes on any
user queries.  So you map the user query to your storage format.

Not sure why you would want to use some stem tokenizer on paths.

And, of course, you're reinventing the wheel unless you have some special
purpose in mind.

http://locate32.cogit.net/

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani
Sent: Saturday, January 12, 2013 4:37 AM
To: sqlite-users@sqlite.org
Cc: Moti LAHIANI
Subject: [sqlite] FTS questions

Hello all

I'm new with sqlite3 and sql.

I have data base that include path columns (file system path like c:\bla
bla\myFiles\1.txt)

On that columns I need to do

1)  search for patterns in case the user want to find a file or
directory

2)  search for prefix path in case the user rename a file or directory

the Database include about 1.5-2.5 million records and to use the "LIKE" is
not possible because the result time.

As an alternative I want to use FTS3 or FTS4 but I think I have a problems
with what I read here: http://www.sqlite.org/fts3.html#section_1_4

And here: http://www.sqlite.org/fts3.html#section_6_3

I need to specify the language to FTS to use it as tokenize but the path
can include multi languages how can I configure the FTS table to use all
languages

How can I tell to FTS to token the path only according to the character "\"
?



More than that when creating FTS table it creates with the TEMPORARY  key
word. My question is: do I need to create this table each time I run the
data base   (because the temporary word) or for each connections (in case
of multiconnections) or this is a table like all tables I declared and it
stay in the data base even if I restart my PC



Thanks a lot
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS questions

2013-01-12 Thread moti lahiani
Hello all

I'm new with sqlite3 and sql.

I have data base that include path columns (file system path like c:\bla
bla\myFiles\1.txt)

On that columns I need to do

1)  search for patterns in case the user want to find a file or
directory

2)  search for prefix path in case the user rename a file or directory

the Database include about 1.5-2.5 million records and to use the "LIKE" is
not possible because the result time.

As an alternative I want to use FTS3 or FTS4 but I think I have a problems
with what I read here: http://www.sqlite.org/fts3.html#section_1_4

And here: http://www.sqlite.org/fts3.html#section_6_3

I need to specify the language to FTS to use it as tokenize but the path
can include multi languages how can I configure the FTS table to use all
languages

How can I tell to FTS to token the path only according to the character "\"
?



More than that when creating FTS table it creates with the TEMPORARY  key
word. My question is: do I need to create this table each time I run the
data base   (because the temporary word) or for each connections (in case
of multiconnections) or this is a table like all tables I declared and it
stay in the data base even if I restart my PC



Thanks a lot
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS questions

2013-01-12 Thread moti lahiani
Hello all

I'm new with sqlite3 and sql.

I have data base that include path columns (file system path like c:\bla
bla\myFiles\1.txt)

On that columns I need to do

1)  search for patterns in case the user want to find a file or
directory

2)  search for prefix path in case the user rename a file or directory

the Database include about 1.5-2.5 million records and to use the "LIKE" is
not possible because the result time.

As an alternative I want to use FTS3 or FTS4 but I think I have a problems
with what I read here: http://www.sqlite.org/fts3.html#section_1_4

And here: http://www.sqlite.org/fts3.html#section_6_3

I need to specify the language to FTS to use it as tokenize but the path
can include multi languages how can I configure the FTS table to use all
languages

How can I tell to FTS to token the path only according to the character "\"
?



More than that when creating FTS table it creates with the TEMPORARY  key
word. My question is: do I need to create this table each time I run the
data base   (because the temporary word) or for each connections (in case
of multiconnections) or this is a table like all tables I declared and it
stay in the data base even if I restart my PC



Thanks a lot
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS substring behavior

2012-11-08 Thread Yongil Jang
A little bit different.
That sentence of "full text search" is split as [ful, tex, sea, ull, ll, l,
ext, xt, xt, ear, arc, rch, ch, h] with 3 maximum length of key.
Of course, search pattern string length is limited to 3 and query is
changed as follows.

[Example: search 'earch']
SELECT ... FROM [FTS table] WHERE [search fields or table] MATCH 'ear' AND
([search field1] like '%earch%' OR [search field2] like '%earch%' OR ... OR
[search fieldn] like '%earch%')

The reason of ...
 1) 3 characters: To minimize FTS index size and to improve search
performance
 2) Adding like clauses: To refine result set
 2) many likes: To search all columns in FTS table when a table name is
used in MATCH clause.

BR,
Yongil Jang.



2012/11/9 Paul Vercellotti 

>
>
> That's a promising project; I hope it reaches maturity.
>
> I assume your modified tokenizer did a similar thing, like tokenizing
> "full text search" as [full, text, search, ull, ll, l, ext, xt, xt, earch,
> arch, rch, ch, h]?
>
> What worked and what did not work?
>
> Thanks,
> Paul
>
>
> 
>  From: Yongil Jang 
> To: General Discussion of SQLite Database 
> Sent: Thursday, November 8, 2012 2:26 PM
> Subject: Re: [sqlite] FTS substring behavior
>
> How about look at following URL?
>
> https://github.com/jonasfj/trilite
>
> AFAIK, FTS doesn't support substring search.
> I also tried to edit FTS to find substring by changing simple tokenizer.
> It was worked partially, but not a good solution to use generally.
>
>
> 2012/11/9 Dan Kennedy 
>
> > On 11/09/2012 01:49 AM, Paul Vercellotti wrote:
> >
> >>
> >>
> >> Hi there,
> >>
> >> I wanted to clarify if FTS could provide any optimization for substring
> >> matches like '*ion*' or similar?
> >>
> >
> > No. I think it will actually search for tokens that start with the 4
> > ASCII characters "*ion" if you try that.
> >
> > Dan.
> >
> > __**_
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS substring behavior

2012-11-08 Thread Paul Vercellotti


That's a promising project; I hope it reaches maturity.

I assume your modified tokenizer did a similar thing, like tokenizing "full 
text search" as [full, text, search, ull, ll, l, ext, xt, xt, earch, arch, rch, 
ch, h]?

What worked and what did not work?

Thanks,
Paul



 From: Yongil Jang 
To: General Discussion of SQLite Database  
Sent: Thursday, November 8, 2012 2:26 PM
Subject: Re: [sqlite] FTS substring behavior
 
How about look at following URL?

https://github.com/jonasfj/trilite

AFAIK, FTS doesn't support substring search.
I also tried to edit FTS to find substring by changing simple tokenizer.
It was worked partially, but not a good solution to use generally.


2012/11/9 Dan Kennedy 

> On 11/09/2012 01:49 AM, Paul Vercellotti wrote:
>
>>
>>
>> Hi there,
>>
>> I wanted to clarify if FTS could provide any optimization for substring
>> matches like '*ion*' or similar?
>>
>
> No. I think it will actually search for tokens that start with the 4
> ASCII characters "*ion" if you try that.
>
> Dan.
>
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS substring behavior

2012-11-08 Thread Yongil Jang
How about look at following URL?

https://github.com/jonasfj/trilite

AFAIK, FTS doesn't support substring search.
I also tried to edit FTS to find substring by changing simple tokenizer.
It was worked partially, but not a good solution to use generally.


2012/11/9 Dan Kennedy 

> On 11/09/2012 01:49 AM, Paul Vercellotti wrote:
>
>>
>>
>> Hi there,
>>
>> I wanted to clarify if FTS could provide any optimization for substring
>> matches like '*ion*' or similar?
>>
>
> No. I think it will actually search for tokens that start with the 4
> ASCII characters "*ion" if you try that.
>
> Dan.
>
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS substring behavior

2012-11-08 Thread Dan Kennedy

On 11/09/2012 01:49 AM, Paul Vercellotti wrote:



Hi there,

I wanted to clarify if FTS could provide any optimization for substring matches 
like '*ion*' or similar?


No. I think it will actually search for tokens that start with the 4
ASCII characters "*ion" if you try that.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS substring behavior

2012-11-08 Thread Paul Vercellotti


Hi there,

I wanted to clarify if FTS could provide any optimization for substring matches 
like '*ion*' or similar?

That is, does it only scan the token index for matching tokens to locate the 
main table records that contain those tokens, or does it do a full table scan 
of the main table?

The number of unique tokens we have is small compared to the total number of 
records, so if it only scanned the token index it would in theory help.

Thanks
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS returns out of memory when use NEAR and OR

2012-10-24 Thread Dan Kennedy

On 10/24/2012 11:07 PM, Vlad Seryakov wrote:

Hello

For some time already i noticed that when i use NEAR/1 and OR in one
query like SELECT * FROM search WHERE search MATCH 'tom NEAR/1 hanks
or tom hanks'


Are you able to share the database file that you use to reproduce
this? Thanks.

Dan.





i get out of memory error. Running this on 16Gb laptop cannot be
memory issue and the database only has several thousands of records.
Investigating the code i found one place where in fts3EvalNearTest
where it happens:

line 129689 i version 3.7.14.1

nTmp += p->pPhrase->doclist.nList; aTmp = sqlite3_malloc(nTmp*2); if(
!aTmp ){

Adding the check before the malloc solved the problem if (nTmp<= 0)
return res;

Thanks

___ sqlite-users mailing
list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS returns out of memory when use NEAR and OR

2012-10-24 Thread Vlad Seryakov
Hello

For some time already i noticed that when i use NEAR/1 and OR in one query like
SELECT * FROM search WHERE search MATCH 'tom NEAR/1 hanks or tom hanks'

i get out of memory error. Running this on 16Gb laptop cannot be memory issue 
and the database only has several thousands of records. Investigating the code 
i found one place where in fts3EvalNearTest where it happens:

line 129689 i version 3.7.14.1

nTmp += p->pPhrase->doclist.nList; 
aTmp = sqlite3_malloc(nTmp*2);
if( !aTmp ){ 

Adding the check before the malloc solved the problem
if (nTmp <= 0) return res;

Thanks

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Phrase queries

2012-07-13 Thread Fabian
Ever since I started using FTS extensively, I frequently ran into this
limitation:

  ** TODO: Strangely, it is not possible to associate a column specifier
  ** with a quoted phrase, only with a single token. Not sure if this was
  ** an implementation artifact or an intentional decision when fts3 was
  ** first implemented. Whichever it was, this module duplicates the
  ** limitation.

Is it ever planned to be fixed, because it doesn't seem to break any
backwards compatibilty?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Example Fails (matchinfo arguments)

2012-06-14 Thread Richard Hipp
On Thu, Jun 14, 2012 at 4:13 PM, Sergei G  wrote:

> Is there a way I can obtain documentation that matches my version?
> Online documentation is for the most current version.
> I have found that both my hosting provider and debian stable are a bit
> behind, so I have to work with what I've got.
>


http://www.sqlite.org/sqlite_docs_3_7_3.zip

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Example Fails (matchinfo arguments)

2012-06-14 Thread Kevin Benson
On Thu, Jun 14, 2012 at 4:27 PM, Kees Nuyt  wrote:

> On Thu, 14 Jun 2012 13:13:58 -0700, Sergei G 
> wrote:
>
> >Is there a way I can obtain documentation that matches my version?
> >Online documentation is for the most current version.
> >I have found that both my hosting provider and debian stable are a bit
> >behind, so I have to work with what I've got.
> >
> >
> >Thanks
>
> Part of the documentation is partly generated from the sqlite source
> code maintained in the fossil [1] source repository.
> The remainder is kept in a separate source repository.
>
>
> You can check out any point on the timelines of both repositories.
>
> Links are at the bottom of :
> http://www.sqlite.org/download.html
>
>
> [1] http://www.fossil-scm.org/
>
> --
> Regards,
>
> Kees Nuyt
>


 And, of course, there's also the Internet Archive:

http://wayback.archive.org/web/*/http://www.sqlite.org

--
   --
  --
 --Ô¿Ô--
K e V i N
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Example Fails (matchinfo arguments)

2012-06-14 Thread Kees Nuyt
On Thu, 14 Jun 2012 13:13:58 -0700, Sergei G 
wrote:

>Is there a way I can obtain documentation that matches my version?
>Online documentation is for the most current version.
>I have found that both my hosting provider and debian stable are a bit
>behind, so I have to work with what I've got.
>
>
>Thanks

Part of the documentation is partly generated from the sqlite source
code maintained in the fossil [1] source repository.
The remainder is kept in a separate source repository.


You can check out any point on the timelines of both repositories.

Links are at the bottom of :
http://www.sqlite.org/download.html 


[1] http://www.fossil-scm.org/

-- 
Regards,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Example Fails (matchinfo arguments)

2012-06-14 Thread Sergei G
Is there a way I can obtain documentation that matches my version?
Online documentation is for the most current version.
I have found that both my hosting provider and debian stable are a bit
behind, so I have to work with what I've got.


Thanks

On Thu, Jun 14, 2012 at 1:16 AM, Dan Kennedy  wrote:

> On 06/14/2012 01:27 PM, Sergei G wrote:
>
>> I am running sqlite3 version 3.7.3 on debian.
>>
>> I run the following commands from fts3.html documentation page:
>>
>> CREATE VIRTUAL TABLE t1 USING fts4(a, b);
>> INSERT INTO t1 VALUES('transaction default models default', 'Non
>> transaction reads');
>> INSERT INTO t1 VALUES('the default transaction', 'these semantics
>> present');
>> INSERT INTO t1 VALUES('single request', 'default data');
>>
>> SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction
>> "these semantics"';
>>
>> SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction';
>>
>> The last line with match info(t1, 'ns') fails with the following error
>> message:
>>
>> Error: wrong number of arguments to function matchinfo()
>>
>> I have originally worked on my own table, but found the same error. So, I
>> have tried example above and it failed for me in the same way.
>>
>
> I think the two argument version of matchinfo() is only in 3.7.4 and
> newer. Prior to that it only accepted one argument.
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Example Fails (matchinfo arguments)

2012-06-14 Thread Dan Kennedy

On 06/14/2012 01:27 PM, Sergei G wrote:

I am running sqlite3 version 3.7.3 on debian.

I run the following commands from fts3.html documentation page:

CREATE VIRTUAL TABLE t1 USING fts4(a, b);
INSERT INTO t1 VALUES('transaction default models default', 'Non
transaction reads');
INSERT INTO t1 VALUES('the default transaction', 'these semantics present');
INSERT INTO t1 VALUES('single request', 'default data');

SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction
"these semantics"';

SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction';

The last line with match info(t1, 'ns') fails with the following error
message:

Error: wrong number of arguments to function matchinfo()

I have originally worked on my own table, but found the same error. So, I
have tried example above and it failed for me in the same way.


I think the two argument version of matchinfo() is only in 3.7.4 and
newer. Prior to that it only accepted one argument.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS Example Fails (matchinfo arguments)

2012-06-13 Thread Sergei G
I am running sqlite3 version 3.7.3 on debian.

I run the following commands from fts3.html documentation page:

CREATE VIRTUAL TABLE t1 USING fts4(a, b);
INSERT INTO t1 VALUES('transaction default models default', 'Non
transaction reads');
INSERT INTO t1 VALUES('the default transaction', 'these semantics present');
INSERT INTO t1 VALUES('single request', 'default data');

SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'default transaction
"these semantics"';

SELECT matchinfo(t1, 'ns') FROM t1 WHERE t1 MATCH 'default transaction';

The last line with match info(t1, 'ns') fails with the following error
message:

Error: wrong number of arguments to function matchinfo()

I have originally worked on my own table, but found the same error. So, I
have tried example above and it failed for me in the same way.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer with custom delimeters

2012-05-06 Thread Richard Hipp
On Sun, May 6, 2012 at 5:50 AM, Jos Groot Lipman  wrote:

> While looking around in the source of the simple tokenizer I found code
> that
> suggests custom delimeters can be specified (I want to exclude the
> underscore).
>
>
>
> http://www.sqlite.org/src/artifact/5c98225a53705e5ee34824087478cf477bdb7004?
> ln=76-87
>
> An indeed:
>  CREATE VIRTUAL TABLE ft USING fts3(title, body, tokenize=simple XX
> [&'\" *()./\\=,:;%<>-?!])
> seems to work fine.
>
> As far as I can tell this feature is undocumented which means I am not
> suppose to use it.
> Is this:
> - An oversight
> - For good reason as it is unstable
> - or: because the syntax might change in the near future?
>

Likely the reason is that we forgot that this feature even exists.  It
seems to have existed in the simple tokenizer, unchanged, since the
original introduction of FTS1 back in 2006.  The fact that it uses argv[1]
instead of argv[0]


>
> Also: I need to include the dummy XX as the delimeters are searched in
> argv[1] in stead of argv[0]. I cannot find what the argv[0] is supposed to
> do here. Any reason?
>

Probably this is a bug.  As far as I can tell, the alternative delimiter
feature of FTS1/2/3 has never been tested.  (We do not (yet) do
full-coverage testing of FTS.)  You may be the first person to ever use
this feature.  But it has been in the code for so long now that we dare not
change it for fear of breaking long-established programs.



> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS simple tokenizer with custom delimeters

2012-05-06 Thread Jos Groot Lipman
While looking around in the source of the simple tokenizer I found code that
suggests custom delimeters can be specified (I want to exclude the
underscore).
 
 
http://www.sqlite.org/src/artifact/5c98225a53705e5ee34824087478cf477bdb7004?
ln=76-87
 
An indeed:
  CREATE VIRTUAL TABLE ft USING fts3(title, body, tokenize=simple XX
[&'\" *()./\\=,:;%<>-?!])
seems to work fine.
 
As far as I can tell this feature is undocumented which means I am not
suppose to use it.
Is this:
- An oversight
- For good reason as it is unstable
- or: because the syntax might change in the near future?
 
Also: I need to include the dummy XX as the delimeters are searched in
argv[1] in stead of argv[0]. I cannot find what the argv[0] is supposed to
do here. Any reason?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite FTS retrieve inverted index

2012-03-12 Thread Mario Annau
Hi Alexey,
tha

Am 12. März 2012 22:14 schrieb Alexey Pechnikov :

> See
> http://www.sqlite.org/draft/fts3.html#fts4aux
>
>
have already tried the fts4aux table. however, I would also need
the number of occurrences of each term in each document.

Therefore, like in the docs, not only

*-- The following query returns this data:**--**-- apple   |
*  |  1  |  1**-- apple   |  0  |  1  |  1**-- banana
|  *  |  2  |  2**-- banana  |  0  |  2  |  2**-- cherry
   |  *  |  3  |  3**-- cherry  |  0  |  1  |  1**--
cherry  |  1  |  2  |  2**-- date|  *  |  1  |  2**--
   date|  0  |  1  |  2**-- elderberry  |  *  |  1  |
2**-- elderberry  |  1  |  1  |  1**-- elderberry  |  1  |  1
|  1*

but a result table like this:

   Term|col  |docid| occurences
--
-- apple   |  0  |  1  |  1
-- banana  |  0  |  2  |  1
-- cherry  |  0  |  3  |  1
-- cherry  |  1  |  1  |  1
-- cherry  |  1  |  2  |  1
-- date|  0  |  2  |  2
-- elderberry  |  0  |  3  |  1
-- elderberry  |  1  |  3  |  1

Best,
mario


> 2012/3/13 Mario Annau :
> > Hello,
> >
> > unfortunately I have already posted this question on
> > stackoverflow<
> http://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table
> >,
> > hope that this mailing list is right address.
> >
> > After I have implemented a full text search function in my application
> > using Sqlite and FTS tables I would be interested
> > in a (performant) way of retrieving the FULL inverted index (or large
> part)
> > out of my FTS (sub-)table.
> >
> > In effect - I would need a result table including the terms, docid's and
> > number of occurences.
> >
> > I am actually searching for some basic code/examples to read the segdir /
> > segments table (where the actual index is stored ) and construct my
> > desired result table (in effect - the inverted index). But any solution
> > which could retrieve the full (or large part of) my inverted index using
> > queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export)
> > would be highly appreciated!
> >
> > Best,
> > mario
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sqlite FTS retrieve inverted index

2012-03-12 Thread Alexey Pechnikov
See
http://www.sqlite.org/draft/fts3.html#fts4aux

2012/3/13 Mario Annau :
> Hello,
>
> unfortunately I have already posted this question on
> stackoverflow<http://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table>,
> hope that this mailing list is right address.
>
> After I have implemented a full text search function in my application
> using Sqlite and FTS tables I would be interested
> in a (performant) way of retrieving the FULL inverted index (or large part)
> out of my FTS (sub-)table.
>
> In effect - I would need a result table including the terms, docid's and
> number of occurences.
>
> I am actually searching for some basic code/examples to read the segdir /
> segments table (where the actual index is stored ) and construct my
> desired result table (in effect - the inverted index). But any solution
> which could retrieve the full (or large part of) my inverted index using
> queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export)
> would be highly appreciated!
>
> Best,
> mario
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Sqlite FTS retrieve inverted index

2012-03-12 Thread Mario Annau
Hello,

unfortunately I have already posted this question on
stackoverflow<http://stackoverflow.com/questions/9657016/get-inverted-index-from-sqlite-fts-table>,
hope that this mailing list is right address.

After I have implemented a full text search function in my application
using Sqlite and FTS tables I would be interested
in a (performant) way of retrieving the FULL inverted index (or large part)
out of my FTS (sub-)table.

In effect - I would need a result table including the terms, docid's and
number of occurences.

I am actually searching for some basic code/examples to read the segdir /
segments table (where the actual index is stored ) and construct my
desired result table (in effect - the inverted index). But any solution
which could retrieve the full (or large part of) my inverted index using
queries including MATCH, MATCHINFO , etc. (sorry, I'm no Sqlite export)
would be highly appreciated!

Best,
mario
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer

2012-02-28 Thread Matt Young
Using the _ character to separate words is an informal  language standard,
s in: method_do_this...

On Tue, Feb 28, 2012 at 12:40 AM, Dan Kennedy  wrote:

> On 02/28/2012 12:09 AM, Jos Groot Lipman wrote:
>
>> It was reported before (and not solved)
>> http://www.mail-archive.com/**sqlite-users@sqlite.org/**msg55959.html
>>
>
> The document sources are updated now. So the fix will appear on
> the website next time it is regenerated.
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer

2012-02-28 Thread Dan Kennedy

On 02/28/2012 12:09 AM, Jos Groot Lipman wrote:

It was reported before (and not solved)
http://www.mail-archive.com/sqlite-users@sqlite.org/msg55959.html


The document sources are updated now. So the fix will appear on
the website next time it is regenerated.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer

2012-02-27 Thread Jos Groot Lipman
It was reported before (and not solved)
http://www.mail-archive.com/sqlite-users@sqlite.org/msg55959.html 

> -Original Message-
> From: sqlite-users-boun...@sqlite.org 
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Hamish Allan
> Sent: maandag 27 februari 2012 11:27
> To: General Discussion of SQLite Database
> Cc: sqlite-users@sqlite.org
> Subject: Re: [sqlite] FTS simple tokenizer
> 
> Thanks Dan. Have just checked how to report bug, and 
> apparently we already have :)
> 
> Please excuse the brevity -- sent from my phone
> 
> On 27 Feb 2012, at 07:06, Dan Kennedy  wrote:
> 
> > On 02/27/2012 05:59 AM, Hamish Allan wrote:
> >> The docs for the simple tokenizer
> >> (http://www.sqlite.org/fts3.html#tokenizer) say:
> >> 
> >> "A term is a contiguous sequence of eligible characters, where 
> >> eligible characters are all alphanumeric characters, the "_"
> >> character, and all characters with UTF codepoints greater than or 
> >> equal to 128."
> >> 
> >> If I do:
> >> 
> >> CREATE VIRTUAL TABLE test USING fts3(); INSERT INTO test (content) 
> >> VALUES ('hello_world');
> >> 
> >> SELECT * FROM test WHERE content MATCH 'orld'; SELECT * FROM test 
> >> WHERE content MATCH 'world';
> >> 
> >> I get no match for the first query, because it doesn't 
> match a term, 
> >> but I get a match for the second, whereas according to my 
> reading of 
> >> the docs "world" shouldn't be a term because the 
> underscore character 
> >> shouldn't be considered a term break.
> >> 
> >> Can anyone please help me understand this behaviour?
> > 
> > Documentation bug. Eligible characters are just 
> alphanumerics and UTF 
> > codepoints greater than 128.
> > 
> > Dan.
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer

2012-02-27 Thread Hamish Allan
Thanks Dan. Have just checked how to report bug, and apparently we already have 
:)

Please excuse the brevity -- sent from my phone

On 27 Feb 2012, at 07:06, Dan Kennedy  wrote:

> On 02/27/2012 05:59 AM, Hamish Allan wrote:
>> The docs for the simple tokenizer
>> (http://www.sqlite.org/fts3.html#tokenizer) say:
>> 
>> "A term is a contiguous sequence of eligible characters, where
>> eligible characters are all alphanumeric characters, the "_"
>> character, and all characters with UTF codepoints greater than or
>> equal to 128."
>> 
>> If I do:
>> 
>> CREATE VIRTUAL TABLE test USING fts3();
>> INSERT INTO test (content) VALUES ('hello_world');
>> 
>> SELECT * FROM test WHERE content MATCH 'orld';
>> SELECT * FROM test WHERE content MATCH 'world';
>> 
>> I get no match for the first query, because it doesn't match a term,
>> but I get a match for the second, whereas according to my reading of
>> the docs "world" shouldn't be a term because the underscore character
>> shouldn't be considered a term break.
>> 
>> Can anyone please help me understand this behaviour?
> 
> Documentation bug. Eligible characters are just alphanumerics and
> UTF codepoints greater than 128.
> 
> Dan.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS simple tokenizer

2012-02-26 Thread Dan Kennedy

On 02/27/2012 05:59 AM, Hamish Allan wrote:

The docs for the simple tokenizer
(http://www.sqlite.org/fts3.html#tokenizer) say:

"A term is a contiguous sequence of eligible characters, where
eligible characters are all alphanumeric characters, the "_"
character, and all characters with UTF codepoints greater than or
equal to 128."

If I do:

CREATE VIRTUAL TABLE test USING fts3();
INSERT INTO test (content) VALUES ('hello_world');

SELECT * FROM test WHERE content MATCH 'orld';
SELECT * FROM test WHERE content MATCH 'world';

I get no match for the first query, because it doesn't match a term,
but I get a match for the second, whereas according to my reading of
the docs "world" shouldn't be a term because the underscore character
shouldn't be considered a term break.

Can anyone please help me understand this behaviour?


Documentation bug. Eligible characters are just alphanumerics and
UTF codepoints greater than 128.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS simple tokenizer

2012-02-26 Thread Hamish Allan
The docs for the simple tokenizer
(http://www.sqlite.org/fts3.html#tokenizer) say:

"A term is a contiguous sequence of eligible characters, where
eligible characters are all alphanumeric characters, the "_"
character, and all characters with UTF codepoints greater than or
equal to 128."

If I do:

CREATE VIRTUAL TABLE test USING fts3();
INSERT INTO test (content) VALUES ('hello_world');

SELECT * FROM test WHERE content MATCH 'orld';
SELECT * FROM test WHERE content MATCH 'world';

I get no match for the first query, because it doesn't match a term,
but I get a match for the second, whereas according to my reading of
the docs "world" shouldn't be a term because the underscore character
shouldn't be considered a term break.

Can anyone please help me understand this behaviour?

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-03 Thread Joe Mistachkin

Stefan Rogin wrote:
>
> I've seen that it's set on .Net 4
>

The VS 2010 project uses .NET 4, the VS 2008 project uses .NET 3.5.

>
> How can I set it to 3.5 and are there any drawbacks?
>

Depends on what environment(s) you are going to use your application in.
There are trade-offs; however, that is why we provide support for both.

--
Joe Mistachkin

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-03 Thread Stefan Rogin
Forget the last question.
I've managed to get passed it :), I didn't open the whole solution file,
just the  SQLite.Interop.2010 project,

I've seen that it's set on .Net 4
How can I set it to 3.5 and are there any drawbacks?

Anyway, thanks for the quick reply. +1 for the support

On Fri, Feb 3, 2012 at 9:41 PM, Stefan Rogin wrote:

> Hi,
>
> I tried to build with all default and it gave me this:
>
> Error 5 error LNK1181: cannot open input file
> '\SQLite.Interop\..\bin\2010\ReleaseModule\bin\System.Data.SQLite.netmodule'
>  \SQLite.Interop\LINK SQLite.Interop.2010
>
> On Fri, Feb 3, 2012 at 5:32 PM, Stefan Rogin 
> wrote:
>
>> Thank you,
>>
>> I'll try to recompile.
>> I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and
>> sqlite3), could those be of any good?
>> În data de 03.02.2012 16:52, "Joe Mistachkin"  a
>> scris:
>>
>>
>>> The native SQLite code bundled with System.Data.SQLite is not compiled
>>> with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for
>>> the expression you are asking about to work properly.
>>>
>>> Of course, it can always be recompiled if you have access to Visual C++,
>>> by editing one of the following files and recompiling the SQLite.Interop
>>> project:
>>>
>>>/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008)
>>>/SQLite.Interop/props/sqlite3.props (for VC++ 2010)
>>>
>>> Where  is the root of the source tree for System.Data.SQLite.
>>>
>>> --
>>> Joe Mistachkin
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>
>
>
> --
>
> *Stefan Rogin*
> Webdesigner
> Tel: +40.769.622.178
>
>


-- 

*Stefan Rogin*
Webdesigner
Tel: +40.769.622.178
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-03 Thread Stefan Rogin
Hi,

I tried to build with all default and it gave me this:

Error 5 error LNK1181: cannot open input file
'\SQLite.Interop\..\bin\2010\ReleaseModule\bin\System.Data.SQLite.netmodule'
 \SQLite.Interop\LINK SQLite.Interop.2010

On Fri, Feb 3, 2012 at 5:32 PM, Stefan Rogin wrote:

> Thank you,
>
> I'll try to recompile.
> I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and
> sqlite3), could those be of any good?
> În data de 03.02.2012 16:52, "Joe Mistachkin"  a
> scris:
>
>
>> The native SQLite code bundled with System.Data.SQLite is not compiled
>> with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for
>> the expression you are asking about to work properly.
>>
>> Of course, it can always be recompiled if you have access to Visual C++,
>> by editing one of the following files and recompiling the SQLite.Interop
>> project:
>>
>>/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008)
>>/SQLite.Interop/props/sqlite3.props (for VC++ 2010)
>>
>> Where  is the root of the source tree for System.Data.SQLite.
>>
>> --
>> Joe Mistachkin
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>


-- 

*Stefan Rogin*
Webdesigner
Tel: +40.769.622.178
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-03 Thread Stefan Rogin
Thank you,

I'll try to recompile.
I looked in the Navicat folder and it had 2 sqlite dlls(sqlite and
sqlite3), could those be of any good?
În data de 03.02.2012 16:52, "Joe Mistachkin"  a
scris:

>
> The native SQLite code bundled with System.Data.SQLite is not compiled
> with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for
> the expression you are asking about to work properly.
>
> Of course, it can always be recompiled if you have access to Visual C++,
> by editing one of the following files and recompiling the SQLite.Interop
> project:
>
>/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008)
>/SQLite.Interop/props/sqlite3.props (for VC++ 2010)
>
> Where  is the root of the source tree for System.Data.SQLite.
>
> --
> Joe Mistachkin
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-03 Thread Joe Mistachkin

The native SQLite code bundled with System.Data.SQLite is not compiled
with the SQLITE_ENABLE_FTS3_PARENTHESIS option, which is required for
the expression you are asking about to work properly.

Of course, it can always be recompiled if you have access to Visual C++,
by editing one of the following files and recompiling the SQLite.Interop
project:

/SQLite.Interop/props/sqlite3.vsprops (for VC++ 2008)
/SQLite.Interop/props/sqlite3.props (for VC++ 2010)

Where  is the root of the source tree for System.Data.SQLite.

--
Joe Mistachkin

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS malformed MATCH expression: [( "a*" OR b*)]

2012-02-02 Thread Stefan Rogin
Hi,

I've encountered some issues when using *FTS4 MATCH* on a database, like
the one mentioned in the subject (*malformed MATCH expression: [( "a*" OR
b*)]*).
>From what I discovered it is triggered by using a quote at the beginning of
a parenthesis, if I write *(b* OR "a*")* it works just fine.
What I found interesting is that if I use Navicat with the SQLite database,
the query above works just fine.
Also there is support for " NOT " opposed to " - " supported
by System.Data.SQLite.

*this is the syntax I used to get the data:*

DataTable dt = new DataTable();

using (SQLiteCommand cmd = mySQLiteConn.CreateCommand())
{
cmd.CommandType = CommandType.Text;
cmd.CommandText = sql;
SQLiteDataAdapter sqlDa = new SQLiteDataAdapter(cmd);
sqlDa.Fill(dt);
}

*exception detail:*

System.Data.SQLite.SQLiteException was unhandled
  Message=SQLite error
malformed MATCH expression: [("a*" OR b*)]
  Source=System.Data.SQLite
  ErrorCode=-2147467259
  StackTrace:
   at System.Data.SQLite.SQLite3.Reset(SQLiteStatement stmt)
   at System.Data.SQLite.SQLite3.Step(SQLiteStatement stmt)
   at System.Data.SQLite.SQLiteDataReader.NextResult()
   at System.Data.SQLite.SQLiteDataReader..ctor(SQLiteCommand cmd,
CommandBehavior behave)
   at System.Data.SQLite.SQLiteCommand.ExecuteReader(CommandBehavior
behavior)
   at
System.Data.SQLite.SQLiteCommand.ExecuteDbDataReader(CommandBehavior
behavior)
   at
System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior
behavior)
   at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset,
DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String
srcTable, IDbCommand command, CommandBehavior behavior)
   at System.Data.Common.DbDataAdapter.Fill(DataTable[] dataTables,
Int32 startRecord, Int32 maxRecords, IDbCommand command, CommandBehavior
behavior)
   at System.Data.Common.DbDataAdapter.Fill(DataTable dataTable)
   at SQLite_Take2.Form1.ftsSearch() in D:\c#\SQLite Take2\Main.cs:line
607
   at SQLite_Take2.Form1.cmdSearch_Click(Object sender, EventArgs e) in
D:\c#\SQLite Take2\Main.cs:line 340
   at System.Windows.Forms.Control.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons
button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ButtonBase.WndProc(Message& m)
   at System.Windows.Forms.Button.WndProc(Message& m)
   at
System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message&
m)
   at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd,
Int32 msg, IntPtr wparam, IntPtr lparam)
   at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG&
msg)
   at
System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32
dwComponentID, Int32 reason, Int32 pvLoopData)
   at
System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32
reason, ApplicationContext context)
   at
System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason,
ApplicationContext context)
   at System.Windows.Forms.Application.Run(Form mainForm)
   at SQLite_Take2.Program.Main() in D:\c#\SQLite Take2\Program.cs:line
18
   at System.AppDomain._nExecuteAssembly(Assembly assembly, String[]
args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence
assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()
  InnerException:


*I use:*
Windows 7 , 64bit
Visual Studio 2010
SQLite Server version : 3.7.7.1
Connection string: "Data Source=.\temp.db;UTF8Encoding=True;Version=3"
System.Data.SQLite.dll version : 1.0.74.0
-- 

*Stefan Rogin*
Webdesigner
Tel: +40.769.622.178
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer

2012-01-03 Thread Abhinav Upadhyay
> Two FTS tables? One with the Porter stemmer, for search, one without, to 
> build the auxiliary tables?

Yeah, that is the last option, if nothing else works. For a small set
of documents the extra processing time might be ok but for a larger
set of documents building the FTS tables twice might be a bit taxing.

I think I will first try a custom tokenizer. So is it ok to execute
SQL statements from within the tokenizer ? It would have been great if
there was some way to determine whether the tokenizer code is being
executed for indexing the documents or for searching the index.

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [FTS] Executing Sql statements inside a custom tokenizer

2012-01-03 Thread Petite Abeille

On Jan 3, 2012, at 8:30 PM, Abhinav Upadhyay wrote:

> What other options do I have ?

Two FTS tables? One with the Porter stemmer, for search, one without, to build 
the auxiliary tables? 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] [FTS] Executing Sql statements inside a custom tokenizer

2012-01-03 Thread Abhinav Upadhyay
Hi,

I would like to build up a table of all the unique words occurring in
my corpus (for spelling suggestion feature). Presently I am using the
Porter stemming tokenizer and I would not like to stop using the
stemmer at any cost. Although if I was not using the Porter stemmer
then I could easily obtain the list of unique words in the corpus
using the FTS4Aux module. But using the stemmer means that all the
words are stored in the index in their stem form which is not
desirable for building a dictionary of proper English words.

One solution is to use a custom tokenizer. I was thinking of using the
default Porter tokenizer supplied with Sqlite and adding some bits of
code to store the token in a separate table before stemming it down.
But I am not sure if it is ok to access or modify the database using
Sql statements inside a tokenizer. Now that I think of it, the
tokenizer code is also executed when an SQL query is performed against
the FTS table (when performing search), at which time I don't want my
dictionary building code to execute. So perhaps this is not a good
idea.

What other options do I have ?

Thanks
Abhinav
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fts */or. Inconsistencies

2011-12-12 Thread Dan Kennedy

On 12/13/2011 02:29 AM, Ephraim Stevens wrote:

I'm using a custom tokenizer in each scenario (yes it works and the proof
is enclosed). In the first dataset, the data was tokenized such that any
alphanumeric character qualifies as part of a token.

In the second dataset, the data was tokenized such that anything other than
a semicolon qualifies as part of a token.\

The issues I'm raising is that 1) the '*' expansion doesn't seem to work
for a alphanumeric/non-alphanumeric token matches nor does the OR operator.
I haven't figured out what I'm missing here. I've been banging my head all
morning against tihs. I appreciate any help.


Do your custom tokenizers allow whitespace or "*" characters
to be part of tokens? If so, try changing them so that they
do not.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fts */or. Inconsistencies

2011-12-12 Thread Ephraim Stevens
SQLite Gurus,

In SQLIte FTS3/4, does the '*' (wildcard expansion character) discriminate
between alphanumeric characters vs non-alpha numeric characters when
matching? I have two test cases below which causes me to believe that it
does. Also, the OR operator appears to fail when matching against a
combination of alphanumerics and non-alphanumerics.

I'm using a custom tokenizer in each scenario (yes it works and the proof
is enclosed). In the first dataset, the data was tokenized such that any
alphanumeric character qualifies as part of a token.

In the second dataset, the data was tokenized such that anything other than
a semicolon qualifies as part of a token.\

The issues I'm raising is that 1) the '*' expansion doesn't seem to work
for a alphanumeric/non-alphanumeric token matches nor does the OR operator.
I haven't figured out what I'm missing here. I've been banging my head all
morning against tihs. I appreciate any help.

I've listed a working and non working example below since a comparison
should clarify best:

[THIS WORKS FINE]

MY DATASET:

STATE|NAMES

---

maryland|fred,louis,jenny

virginia|ruth,greg,denise

maine|richard,norman,willis

TOKENIZER CRITERIA USED: = any word characters (\w+)

(comment: all these names should be split into regular tokens and they were
as shown below)

FULL TEXT INDEX:

sqlite> select * from ft_terms;

term col documents occurrences

-- -- -- ---

fred * 1 1

fred 1 1 1

jenny * 1 1

jenny 1 1 1

louis * 1 1

louis 1 1 1

maryland * 1 1

maryland 0 1 1

ruth * 1 1

ruth 1 1 1

virginia * 1 1

virginia 0 1 1

QUERY:

select * from word where word match 'mary* jen*'

RETURNS:

maryland|fred,louis,jenny

QUERY:

select * from word where word match 'mary* OR v*'

RETURNS:

maryland|fred,louis,jenny

virginia|ruth,greg,denise

All the above behaves as expected. Now lets introduce some non-alphanumerics

--

[THIS DOESN'T WORK]

DATASET:

ROWSET|PAIR

--

1 A=15;B=16;C=38

2 D=15;E=25;F=16

TOKENIZER CRITERIA USED: = any character that is NOT a semicolon ([^;]+)

(comment: all these PAIR values should be tokenized by semicolon and they
were as shown below)

FULL TEXT INDEX:

term col documents occurrences

-- -- -- ---

1 * 1 1

1 0 1 1

2 * 1 1

2 0 1 1

A=15 * 1 1

A=15 1 1 1

B=16 * 1 1

B=16 1 1 1

C=38 * 1 1

C=38 1 1 1

D=15 * 1 1

D=15 1 1 1

E=25 * 1 1

E=25 1 1 1

F=16 * 1 1

F=16 1 1 1

QUERY1:

select * from NUMMY where NUMMY MATCH 'A=* OR D=*'

RETURNS:

(nothing)

comment: Should have returned ROWSET 1 and 2 (refer to above dataset)

QUERY2:

select * from NUMMY where NUMMY MATCH 'A* C*'

RETURNS:

(nothing)

comment: Should have returned ROWSET 1 (refer to above dataset)

--HOWEVER SPECIFYING THE FULL TOKEN WORKS--

QUERY3:

select * from NUMMY where NUMMY MATCH 'A=15'

RETURNS:

1|A=15;B=16;C=38

QUERY4:

select * from NUMMY where NUMMY MATCH 'A=15'

RETURNS:

1|A=15;B=16;C=38

QUERY5:

select * from NUMMY where NUMMY MATCH 'E=25'

RETURNS:

2|D=15;E=25;F=16

--THIS SUCCESS IS SHORTLIVED--

QUERY6:

select * from NUMMY where NUMMY MATCH 'E=25 OR B=16'

RETURNS:

(nothing)

This query should have returned rowset 1 and two as this was an OR query
and both creiteria are met.

In summary, the wildcard expansion as well as the OR operator seems not to
work in the second example. There only main difference between the two data
sets in that the first set is composed of alpha characters only and the
second is a combination of alphanumeric and non-aplhanumerica characters.
The (*) expansion character is not matching these. A match only occurs when
you specify the full token.

Thanks for your time in looking at this issue.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Tokenizer (separator)

2011-12-09 Thread Richard Hipp
On Fri, Dec 9, 2011 at 6:48 AM, Ephraim Stevens
wrote:

> Greetings All,
>
> From section seven of the FTS3/FTS4 documentation:
>
> A term is a contiguous sequence of eligible characters, where eligible
> characters are all alphanumeric characters, the "_" character, and all
> characters with UTF codepoints greater than or equal to 128. All other
> characters are discarded when splitting a document into terms. Their only
> contribution is to separate adjacent terms.
>
>
> Is there a way to modify/control this behavior?  I would like the equal
> sign ('=') to be treated with the same designation as an alpha numeric
> character. Currently, the equal sign acts as a separator.
>
> Thanks in advance for any suggestions/help you provide.
>

Create your own tokenizer.  http://www.sqlite.org/fts3.html#section_7_1


> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS Tokenizer (separator)

2011-12-09 Thread Ephraim Stevens
Greetings All,

>From section seven of the FTS3/FTS4 documentation:

A term is a contiguous sequence of eligible characters, where eligible
characters are all alphanumeric characters, the "_" character, and all
characters with UTF codepoints greater than or equal to 128. All other
characters are discarded when splitting a document into terms. Their only
contribution is to separate adjacent terms.


Is there a way to modify/control this behavior?  I would like the equal
sign ('=') to be treated with the same designation as an alpha numeric
character. Currently, the equal sign acts as a separator.

Thanks in advance for any suggestions/help you provide.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS: Phrase queries

2011-11-27 Thread Fabian
2011/11/14 nobre 

>
> Comment from the source:
>
>  ** TODO: Strangely, it is not possible to associate a column specifier
>  ** with a quoted phrase, only with a single token. Not sure if this was
>  ** an implementation artifact or an intentional decision when fts3 was
>  ** first implemented. Whichever it was, this module duplicates the
>  ** limitation.
>
> So, seems its really not possible
> Rafael


Thanks, I hope this limitation will be lifted someday. Fixing it will not
break any existing queries and is fully backwards compatible, so I dont
understand why FTS4 duplicates that (faulty) behaviour.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS: Phrase queries

2011-11-14 Thread nobre

Comment from the source:

  ** TODO: Strangely, it is not possible to associate a column specifier
  ** with a quoted phrase, only with a single token. Not sure if this was
  ** an implementation artifact or an intentional decision when fts3 was
  ** first implemented. Whichever it was, this module duplicates the 
  ** limitation.

So, seems its really not possible
Rafael


Fabian-40 wrote:
> 
> When I have a basic FTS query that needs to be restricted to a column, I
> can write it in two ways:
> 
> 1.) WHERE column MATCH 'apple'
> 2.) WHERE table MATCH 'column:apple'
> 
> But when I have a phrase query, I can only write it in one way:
> 
> 1.) WHERE column MATCH '"apple juice"'
> 
> The problem is that when I want to combine the queries (search for 'apple'
> in column1 and for "apple juice" in column2) i cannot write the query
> like:
> 
> WHERE column1 MATCH 'apple' AND column2 MATCH '"apple juice"'
> 
> Nor can I write it like:
> 
> WHERE table MATCH 'column1:apple column2:"apple juice"'
> 
> So this fairly simple query, seems impossible in FTS? Or does anyone know
> how to workaround this (without doing two seperate queries)?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 

-- 
View this message in context: 
http://old.nabble.com/FTS%3A-Phrase-queries-tp32834649p32839669.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Phrase queries

2011-11-13 Thread Fabian
When I have a basic FTS query that needs to be restricted to a column, I
can write it in two ways:

1.) WHERE column MATCH 'apple'
2.) WHERE table MATCH 'column:apple'

But when I have a phrase query, I can only write it in one way:

1.) WHERE column MATCH '"apple juice"'

The problem is that when I want to combine the queries (search for 'apple'
in column1 and for "apple juice" in column2) i cannot write the query like:

WHERE column1 MATCH 'apple' AND column2 MATCH '"apple juice"'

Nor can I write it like:

WHERE table MATCH 'column1:apple column2:"apple juice"'

So this fairly simple query, seems impossible in FTS? Or does anyone know
how to workaround this (without doing two seperate queries)?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Alexey Pechnikov
2011/10/19 Fabian :
> I always do inserts in batches of 100.000 rows, and after each batch I
> manually merge the b-trees using:
>
>  INSERT INTO table(table) VALUES('optimize');
>
> Is there a possibility that it will do automatic maintenance half-way during
> a batch? Or will it always wait untill the transaction is finished?

I think you are victim of the premature optimization :)

See documentation:
"several different b-trees that are incrementally merged as rows are inserted,
updated and deleted. This technique improves performance when writing to an
FTS table, but causes some overhead for full-text queries that use the index."

So you can work with a big FTS tables without using the "optimize" method.
I use some FTS tables with tens of millions records and effect of the
"optimize"
isn't measurable.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Scott Hess
On Wed, Oct 19, 2011 at 12:50 PM, Fabian  wrote:
> 2011/10/19 Scott Hess 
>> To be clear, how it works is that new insertions are batched into a
>> new index tree, with index trees periodically aggregated to keep
>> selection efficient and to keep the size contained.  So while the
>> speed per insert should remain pretty stable constant, periodically an
>> insert will require index maintenance, so that insert will be slower.
>> If you have a lot of documents (or a small page cache) these
>> maintenance events can get pretty expensive relative to the cost of a
>> non-maintenance insert.  So it's not a clear-cut win, but it probably
>> would be interesting as an alternative sort of index for some tables.
>
> I always do inserts in batches of 100.000 rows, and after each batch I
> manually merge the b-trees using:
>
>  INSERT INTO table(table) VALUES('optimize');
>
> Is there a possibility that it will do automatic maintenance half-way during
> a batch? Or will it always wait untill the transaction is finished?

It does it when it does it, in fact you're probably getting some small
merges during this process already.  If you're doing your batch
inserts within a surrounding transaction, and are inserting documents
by ascending docid (or letting the system choose docid), it can buffer
up many updates in memory before flushing them to disk, which is
pretty efficient.  Inserting 100,000 documents this way will probably
not hit any very large merges, unless your documents tend to have a
very large number of unique terms.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Reduce tokens

2011-10-19 Thread Fabian
Using the default tokenizer, everything that is not an alphanumeric
character or an underscore, will generate a new token.

I have a lot of columns that contains e-mail addresses or URL's, and most of
them have characters like '.', '@' and '/'. Is there a simple way to make
FTS see them as one single token, instead of splitting those strings into
many small ones? I know it's possible to develop a custom tokenizer, but
that's way over my head I'm afraid :)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
2011/10/19 Scott Hess 

>
> To be clear, how it works is that new insertions are batched into a
> new index tree, with index trees periodically aggregated to keep
> selection efficient and to keep the size contained.  So while the
> speed per insert should remain pretty stable constant, periodically an
> insert will require index maintenance, so that insert will be slower.
> If you have a lot of documents (or a small page cache) these
> maintenance events can get pretty expensive relative to the cost of a
> non-maintenance insert.  So it's not a clear-cut win, but it probably
> would be interesting as an alternative sort of index for some tables.
>
>
I always do inserts in batches of 100.000 rows, and after each batch I
manually merge the b-trees using:

 INSERT INTO table(table) VALUES('optimize');

Is there a possibility that it will do automatic maintenance half-way during
a batch? Or will it always wait untill the transaction is finished?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Scott Hess
On Wed, Oct 19, 2011 at 7:56 AM, Fabian  wrote:
> 2011/10/19 Alexey Pechnikov 
>> FTS use index multi-tree and de-facto has _no_ insert speed degradation.
>
> Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that
> same multi-tree mechanism for regular indexes, but that's a whole different
> question.

To be clear, how it works is that new insertions are batched into a
new index tree, with index trees periodically aggregated to keep
selection efficient and to keep the size contained.  So while the
speed per insert should remain pretty stable constant, periodically an
insert will require index maintenance, so that insert will be slower.
If you have a lot of documents (or a small page cache) these
maintenance events can get pretty expensive relative to the cost of a
non-maintenance insert.  So it's not a clear-cut win, but it probably
would be interesting as an alternative sort of index for some tables.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Black, Michael (IS)
Nope -- didn't note the insert speed on that test.



Why don't you take my benchmark data and test it yourself?  Then post the 
results.



The saying "your mileage may vary" comes to mind...





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Fabian [fabianpi...@gmail.com]
Sent: Wednesday, October 19, 2011 9:44 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] FTS vs INDEX

Very interesting benchmarks! However it seems to focus mainly on the speed
of SELECT queries, and the total size of the resulting database on disk. But
my main concern is about the speed of INSERT queries vs normal tables. Any
chance you compared that too?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Alexey Pechnikov
2011/10/19 Fabian :
> Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that
> same multi-tree mechanism for regular indexes, but that's a whole different
> question.

It's impossible with SQLite3 database format. May be SQLite4 will be
support it :)

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
2011/10/19 Alexey Pechnikov 

> FTS use index multi-tree and de-facto has _no_ insert speed degradation.
>

Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that
same multi-tree mechanism for regular indexes, but that's a whole different
question.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Alexey Pechnikov
FTS use index multi-tree and de-facto has _no_ insert speed degradation.
I did do test for 400+ millions of records.
With b-tree index there is insert speed degradation:
http://geomapx.blogspot.com/2010/04/sqlite-index-degradation-tests.html
http://geomapx.blogspot.com/search?q=index+speed

So FTS as hash-index is nice.

2011/10/19 Fabian :
> Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT
> INDEX column? I don't need many of the extra features of FTS, because I
> always need to look up rows by prefix or exact match, and both can be
> implemented efficiently via TEXT INDEX too. But if the overhead is
> comparable, I'd rather use FTS.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


  1   2   3   >