[sqlite] Full text search implementation

2004-01-16 Thread George Ionescu
Hello dear SQLite users.
Hello Dr. Hipp,

I've been using sqlite for quite some time now and I am a fan (thank you Dr.
Hipp).

I will try to rewrite my (unfinished) Document Management application using
SQLite and I was wondering if Full Text Search will be implemented.

I'vee seen a ticket posted by Dr. Hipp on 2003-09-03 where he got a response
from a guy who's oppinion was that Full Text Search is not useful in sqlite;
I have a different oppinion: since all major database engines implement this
feature, why should sqlite be left aside?

Regarding Full Text Search syntax, I think CONTAINS statement would be ok
(as implemented in other database engines).
So:
SELECT * FROM Categories
WHERE CONTAINS(CategoryDescription, 'Beverages')
would return all the rows where column CategoryDescription contains the
word 'Beverages' :-)

similar
SELECT * FROM Categories
WHERE CONTAINS(CategoryDescription, ' Beverages AND NOT Whiskey
')
would return all the rows where column CategoryDescription contains the
word 'Beverages' and does not contain the word 'Whiskey'

Ranking return would be a cool feature. However, another feature I drewl
about would be 'highlighting hits'. For this, the position / positions in
the text where the word was found would be enough (I think).

Eagerly waiting for your answers and comments,
Sincerely,
George Ionescu


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] SQLite Browser (Mac OS 10.3)

2004-01-16 Thread Mauricio Piacentini

Yes... in OS X, everything that happens is written on in an app called Console. I write the author a while back too and he asked that I attempt to launch it again with Console running and then send him whatever errors/messages appear, but... strangely... I am still getting nothing in the Console... as I did then.
I answered to the original request at Sourceforge forums, so we can take 
it from there. Just wanted to point out that the MacOSX Console is not 
the same as the Terminal window, this is a common confusion for users 
coming from Unix/Linux. The Console is a separate application that logs 
system level errors that would prevent an app from launching, like the 
lack of shared libraries, etc. Make sure you are really checking 
Console.app and not the terminal window.
As someone wrote already, shared libraries are definately a weak spot 
under OSX. Some consider their implementation simply broke. I would not 
go as far, but there are certainly something not right about them 
specially if you are launching from the Finder. I believe the standard 
SQLite distribution does not compile as a shared library under OSX 
exactly because of this problems, can anyone confirm this?

Regards,
Mauricio
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] SQLite Browser (Mac OS 10.3)

2004-01-16 Thread Wade Preston Shearer
Yes... in OS X, everything that happens is written on in an app 
called Console. I write the author a while back too and he asked that 
I attempt to launch it again with Console running and then send him 
whatever errors/messages appear, but... strangely... I am still 
getting nothing in the Console... as I did then.
I answered to the original request at Sourceforge forums, so we can 
take it from there. Just wanted to point out that the MacOSX Console 
is not the same as the Terminal window, this is a common confusion for 
users coming from Unix/Linux. The Console is a separate application 
that logs system level errors that would prevent an app from 
launching, like the lack of shared libraries, etc. Make sure you are 
really checking Console.app and not the terminal window.
I understand the difference between the two and use them both often, 
for their intended purposes. Yes, it was Console that I had open each 
time I was launching SQLite Browser and attempting to monitor for 
errors... and each time Console showed nothing. I have tried this on 
multiple computers running 10.3.

As someone wrote already, shared libraries are definately a weak spot 
under OSX. Some consider their implementation simply broke. I would 
not go as far, but there are certainly something not right about them 
specially if you are launching from the Finder. I believe the standard 
SQLite distribution does not compile as a shared library under OSX 
exactly because of this problems, can anyone confirm this?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] SQLite Browser (Mac OS 10.3)

2004-01-16 Thread Will Leshner
On Jan 16, 2004, at 7:15 AM, Wade Preston Shearer wrote:

I understand the difference between the two and use them both often, 
for their intended purposes. Yes, it was Console that I had open each 
time I was launching SQLite Browser and attempting to monitor for 
errors... and each time Console showed nothing. I have tried this on 
multiple computers running 10.3.

In Console, you may need to change which log you are looking at.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] SQLite Browser (Mac OS 10.3)

2004-01-16 Thread Wade Preston Shearer
I understand the difference between the two and use them both often, 
for their intended purposes. Yes, it was Console that I had open each 
time I was launching SQLite Browser and attempting to monitor for 
errors... and each time Console showed nothing. I have tried this on 
multiple computers running 10.3.

In Console, you may need to change which log you are looking at.
Ah... very effective post. Any idea which log I should view?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] SQLite Browser (Mac OS 10.3)

2004-01-16 Thread Will Leshner
On Jan 16, 2004, at 7:31 AM, Wade Preston Shearer wrote:

Ah... very effective post. Any idea which log I should view?



Yes, I wanted to add that info, but I'm not sure myself. I want to say 
system.log but I'm not sure. When I think when I add debugging code 
to sqlite myself, that's where it ends up.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Full text search implementation

2004-01-16 Thread Brad Campbell
George Ionescu wrote:
Hello dear SQLite users.
Hello Dr. Hipp,
I've been using sqlite for quite some time now and I am a fan (thank you Dr.
Hipp).
I will try to rewrite my (unfinished) Document Management application using
SQLite and I was wondering if Full Text Search will be implemented.
I'vee seen a ticket posted by Dr. Hipp on 2003-09-03 where he got a response
from a guy who's oppinion was that Full Text Search is not useful in sqlite;
I have a different oppinion: since all major database engines implement this
feature, why should sqlite be left aside?
Regarding Full Text Search syntax, I think CONTAINS statement would be ok
(as implemented in other database engines).
So:
SELECT * FROM Categories
WHERE CONTAINS(CategoryDescription, 'Beverages')
would return all the rows where column CategoryDescription contains the
word 'Beverages' :-)
similar
SELECT * FROM Categories
WHERE CONTAINS(CategoryDescription, ' Beverages AND NOT Whiskey
')
would return all the rows where column CategoryDescription contains the
word 'Beverages' and does not contain the word 'Whiskey'
My regex patch should do that

SELECT * FROM Categories WHERE CategoryDescription RLIKE 'Beverages and CategoryDescription NOT 
RLIKE 'Whiskey';

You can find a working test version here
http://www.wasp.net.au/~brad/sqlite-110104-snapshot-bkc1.tgz
Brad

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Full text search implementation

2004-01-16 Thread David Morel
Le ven 16/01/2004 à 15:30, George Ionescu a écrit :
 I've been using sqlite for quite some time now and I am a fan (thank you Dr.
 Hipp).

so am I

 Regarding Full Text Search syntax, I think CONTAINS statement would be ok
 (as implemented in other database engines).
 So:
 SELECT * FROM Categories
 WHERE CONTAINS(CategoryDescription, 'Beverages')
 would return all the rows where column CategoryDescription contains the
 word 'Beverages' :-)

doesn't 
SELECT * FROM Categories  WHERE CategoryDescription LIKE '%Beverages%';
do just that ?

I guess full-text indexing applies to columns containing larger chunks
of text...

D.Morel
-- 
***
[EMAIL PROTECTED]
OpenPGP public key: http://www.amakuru.net/dmorel.asc
28192ef126bc871757cb7d97f4a44536




signature.asc
Description: Ceci est une partie de message	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e=2E?=


Re: [sqlite] Full text search implementation

2004-01-16 Thread David Morel
 My regex patch should do that
 
 SELECT * FROM Categories WHERE CategoryDescription RLIKE 'Beverages and 
 CategoryDescription NOT 
 RLIKE 'Whiskey';
 

In such a simple string matching I suspect a regex search is totally
overkill... that's ok for a db containing 1000 rows, but try it on
700,000 rows (390Mb) like the one i have here ;-)

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
-- 
***
[EMAIL PROTECTED]
OpenPGP public key: http://www.amakuru.net/dmorel.asc
28192ef126bc871757cb7d97f4a44536




signature.asc
Description: Ceci est une partie de message	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e=2E?=


Re: [sqlite] Full text search implementation

2004-01-16 Thread Brad Campbell
David Morel wrote:
My regex patch should do that

SELECT * FROM Categories WHERE CategoryDescription RLIKE 'Beverages and CategoryDescription NOT 
RLIKE 'Whiskey';



In such a simple string matching I suspect a regex search is totally
overkill... that's ok for a db containing 1000 rows, but try it on
700,000 rows (390Mb) like the one i have here ;-)
Yup, as soon as I sent it I thought What a perfect application for LIKE 
'%Whiskey%'..
And I do like Whiskey!
:p)

Brad

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [sqlite] Full text search implementation

2004-01-16 Thread Steve O'Hara


 -Original Message-
 From: David Morel [mailto:[EMAIL PROTECTED]
 Sent: 16 January 2004 17:32
 To: Brad Campbell
 Cc: George Ionescu; [EMAIL PROTECTED]
 Subject: Re: [sqlite] Full text search implementation


  My regex patch should do that
 
  SELECT * FROM Categories WHERE CategoryDescription RLIKE
 'Beverages and CategoryDescription NOT
  RLIKE 'Whiskey';
 

 In such a simple string matching I suspect a regex search is totally
 overkill... that's ok for a db containing 1000 rows, but try it on
 700,000 rows (390Mb) like the one i have here ;-)

I don't think that your LIKE version will perform much better - SQLite
doesn't use indexes when doing LIKE comparisons.

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 --
 ***
 [EMAIL PROTECTED]
 OpenPGP public key: http://www.amakuru.net/dmorel.asc
 28192ef126bc871757cb7d97f4a44536




Using LIKE as a means of doing a full text search is virtually useless in
the real world of text retrieval.  The query take no account of context,
which is essential when dealing with intelligent text queries.
A full-on full text engine (BASIS, BRS etc) has to maintain a set of
meta-data for each text column that can be searched i.e.

When data is added to a text column, the text must be parsed to split it up
into searchable words using a break character list.
These words must then be reduced to their searchable stem (pluralisation,
inflexions, Porter stemming etc) and insignificant words ('a', 'and', 'the'
etc (stop words)) removed.
The words are then added to the column index - the posting in the index
contains the row ID, the start character position of the word and the
original length of the word.  It may also contain grammatical context info
such as the sentence/paragraph number.
At this point, some systems may also add into the index other variants of
the words (common mispellings, morphs etc) to improve recall.

Now, when you do a search on that column, the system has to parse your query
terms, stem them and weed out stop words in the same way as when data was
added.  It then looks up the words in the column index and collates the
proximity of the words.
There's not normally much point in searching for 'SQLite' and 'document' if
you can't tell the system to find them with the same sentence, paragraph, or
adjacent.

As you can see, a proper full text search engine is considerably more work
than it first looks.  Add onto this all the complexities of applying this to
different languages and you have a pretty major coding effort on your hands.

I have a working prototype of such a beast using SQLite that I'd be
interested in sharing the devlopment of, if anyone is interested?

Steve




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Full text search implementation

2004-01-16 Thread Bertrand Mansion
[EMAIL PROTECTED] wrote :

 
 
 -Original Message-
 From: David Morel [mailto:[EMAIL PROTECTED]
 Sent: 16 January 2004 17:32
 To: Brad Campbell
 Cc: George Ionescu; [EMAIL PROTECTED]
 Subject: Re: [sqlite] Full text search implementation
 
 
 My regex patch should do that
 
 SELECT * FROM Categories WHERE CategoryDescription RLIKE
 'Beverages and CategoryDescription NOT
 RLIKE 'Whiskey';
 
 
 In such a simple string matching I suspect a regex search is totally
 overkill... that's ok for a db containing 1000 rows, but try it on
 700,000 rows (390Mb) like the one i have here ;-)
 
 I don't think that your LIKE version will perform much better - SQLite
 doesn't use indexes when doing LIKE comparisons.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 --
 ***
 [EMAIL PROTECTED]
 OpenPGP public key: http://www.amakuru.net/dmorel.asc
 28192ef126bc871757cb7d97f4a44536
 
 
 
 
 Using LIKE as a means of doing a full text search is virtually useless in
 the real world of text retrieval.  The query take no account of context,
 which is essential when dealing with intelligent text queries.
 A full-on full text engine (BASIS, BRS etc) has to maintain a set of
 meta-data for each text column that can be searched i.e.
 
 When data is added to a text column, the text must be parsed to split it up
 into searchable words using a break character list.
 These words must then be reduced to their searchable stem (pluralisation,
 inflexions, Porter stemming etc) and insignificant words ('a', 'and', 'the'
 etc (stop words)) removed.
 The words are then added to the column index - the posting in the index
 contains the row ID, the start character position of the word and the
 original length of the word.  It may also contain grammatical context info
 such as the sentence/paragraph number.
 At this point, some systems may also add into the index other variants of
 the words (common mispellings, morphs etc) to improve recall.
 
 Now, when you do a search on that column, the system has to parse your query
 terms, stem them and weed out stop words in the same way as when data was
 added.  It then looks up the words in the column index and collates the
 proximity of the words.
 There's not normally much point in searching for 'SQLite' and 'document' if
 you can't tell the system to find them with the same sentence, paragraph, or
 adjacent.
 
 As you can see, a proper full text search engine is considerably more work
 than it first looks.  Add onto this all the complexities of applying this to
 different languages and you have a pretty major coding effort on your hands.
 
 I have a working prototype of such a beast using SQLite that I'd be
 interested in sharing the devlopment of, if anyone is interested?

I agree. You just forgot about the scoring algorithm, a full text query
should also be able to return a score.

IMO, search engines (with tokenizer, indexer, stemmer, stopwords,
substrings, fuzzy, binary converter...) offer a good choice in terms of
features when it comes to full text search. It seems that Mnogosearch
(http://www.mnogosearch.org) has included sqlite as their default db for
their search engine software. So this might be a good companion to sqlite.

The best solution would be IMO to have all this optionally integrated into
the database engine. That's what Oracle does with Context and last time I
have used it, it was working very well (it was with Oracle 9i). But that
might also make the engine too heavy.

Mysql offers fulltext but it is only available AFAIK on the MyISAM table
format. So you can't have foreign key constraints (available on InnoDB
format) and full text at the same time.

Bertrand Mansion
Mamasam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[sqlite] datetime modifier for localtime

2004-01-16 Thread Derrell . Lipman
end_time contains an integer value returned by the C function time(NULL), thus
a value in the unixepoch format.  How do I retrieve the pretty-printed value
of the localtime?  As can be seen from the statements below, I can retrieve
the 'unixepoch' datetime value but not a value converted to local time...???
(This is with 2.8.8)

sqlite .nullvalue null
sqlite SELECT end_time FROM the_table LIMIT 1;
1073928428
sqlite SELECT datetime(end_time, 'unixepoch') FROM the_table LIMIT 1;
2004-01-12 17:27:08
sqlite SELECT datetime(end_time, 'localtime') FROM the_table LIMIT 1;
null
sqlite SELECT datetime(end_time, 'unixepoch', 'localtime') FROM the_table LIMIT 1;
null


Thanks,

Derrell

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [sqlite] Full text search implementation

2004-01-16 Thread Steve O'Hara


 -Original Message-
 From: Bertrand Mansion [mailto:[EMAIL PROTECTED]
 Sent: 16 January 2004 19:18
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [sqlite] Full text search implementation


 [EMAIL PROTECTED] wrote :

 
 
  -Original Message-
  From: David Morel [mailto:[EMAIL PROTECTED]
  Sent: 16 January 2004 17:32
  To: Brad Campbell
  Cc: George Ionescu; [EMAIL PROTECTED]
  Subject: Re: [sqlite] Full text search implementation
 
 
  My regex patch should do that
 
  SELECT * FROM Categories WHERE CategoryDescription RLIKE
  'Beverages and CategoryDescription NOT
  RLIKE 'Whiskey';
 
 
  In such a simple string matching I suspect a regex search is totally
  overkill... that's ok for a db containing 1000 rows, but try it on
  700,000 rows (390Mb) like the one i have here ;-)
 
  I don't think that your LIKE version will perform much better - SQLite
  doesn't use indexes when doing LIKE comparisons.
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  --
  ***
  [EMAIL PROTECTED]
  OpenPGP public key: http://www.amakuru.net/dmorel.asc
  28192ef126bc871757cb7d97f4a44536
 
 
 
 
  Using LIKE as a means of doing a full text search is virtually
 useless in
  the real world of text retrieval.  The query take no account of context,
  which is essential when dealing with intelligent text queries.
  A full-on full text engine (BASIS, BRS etc) has to maintain a set of
  meta-data for each text column that can be searched i.e.
 
  When data is added to a text column, the text must be parsed to
 split it up
  into searchable words using a break character list.
  These words must then be reduced to their searchable stem
 (pluralisation,
  inflexions, Porter stemming etc) and insignificant words ('a',
 'and', 'the'
  etc (stop words)) removed.
  The words are then added to the column index - the posting in the index
  contains the row ID, the start character position of the word and the
  original length of the word.  It may also contain grammatical
 context info
  such as the sentence/paragraph number.
  At this point, some systems may also add into the index other
 variants of
  the words (common mispellings, morphs etc) to improve recall.
 
  Now, when you do a search on that column, the system has to
 parse your query
  terms, stem them and weed out stop words in the same way as
 when data was
  added.  It then looks up the words in the column index and collates the
  proximity of the words.
  There's not normally much point in searching for 'SQLite' and
 'document' if
  you can't tell the system to find them with the same sentence,
 paragraph, or
  adjacent.
 
  As you can see, a proper full text search engine is
 considerably more work
  than it first looks.  Add onto this all the complexities of
 applying this to
  different languages and you have a pretty major coding effort
 on your hands.
 
  I have a working prototype of such a beast using SQLite that I'd be
  interested in sharing the devlopment of, if anyone is interested?

 I agree. You just forgot about the scoring algorithm, a full text query
 should also be able to return a score.

 IMO, search engines (with tokenizer, indexer, stemmer, stopwords,
 substrings, fuzzy, binary converter...) offer a good choice in terms of
 features when it comes to full text search. It seems that Mnogosearch
 (http://www.mnogosearch.org) has included sqlite as their default db for
 their search engine software. So this might be a good companion to sqlite.

 The best solution would be IMO to have all this optionally integrated into
 the database engine. That's what Oracle does with Context and last time I
 have used it, it was working very well (it was with Oracle 9i). But that
 might also make the engine too heavy.

 Mysql offers fulltext but it is only available AFAIK on the MyISAM table
 format. So you can't have foreign key constraints (available on InnoDB
 format) and full text at the same time.

 Bertrand Mansion
 Mamasam



Yes you're right, all full text engines have a ranking mechanism.  Some also
have term weighting which is very useful especially if it is dynamic.

Interestingly, when I first looked at Context some years ago, it was
apparent that the full text search functions are not actually part of the
core database engine.
The Context portion is actually a layer that surrounds the Oracle kernel and
uses standard SQL services to achieve the full text capability i.e. for
every column that is free text searchable, a surrogate table is created that
contains the terms, positions etc of the terms from that column.
The Context layer then runs standard SQL queries on this table to affect
full text search.
I'm only aware of one single true, full text RDBMS and that's BASIS from
Open Text.

This is the approach I've taken with SQLite and it works pretty well.  Going
down this 

Re: [sqlite] datetime modifier for localtime

2004-01-16 Thread Derrell . Lipman
Kurt Welgehausen [EMAIL PROTECTED] writes:

 If I remember right, localtime and gmtime were not implemented
 in 2.8.8.  You need to upgrade.

Ah!  Well that would certainly explain it.  Thanks!

Derrell

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]