Re: [sqlite] Full text search FTS3 of files

2010-10-18 Thread Sam Roberts
Take a look at the custom tokenizer API. I think tokens returned don't
necessarily have to be substrings of the text. So, maybe the text you
"tokenize" could be the file path, but the tokens could be things you
pull from the contents of the file.

Just a thought,
Cheers,
Sam
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full text search FTS3 of files

2010-10-18 Thread Scott Hess
On Sun, Oct 17, 2010 at 11:13 PM, Dami Laurent (PJ)
 wrote:
>>Is it possible to use FTS3 for search without storing the actual file
>>contents/search terms/keywords in a row. In other words, create a FTS3
>>tables with rows that only contains an ID and populate the B-Tree with
>>keywords for search.
>
> Each FTS3 table t is stored internally within three regular tables :
> t_content, t_segments and t_segdir. The last two tables contain the
> fulltext index. The first table t_content stores the complete documents
> being indexed, and is only used when you call the offsets() or
> snippets() functions. So if you don't need those functions, you can
> cheat : a) call FTS3 to index your document as usual; b) do an update on
> the t_content table to remove the document text.

If you do this, it is probably safest to replace the columns in the
content table with empty strings, rather than deleting them entirely.
It won't remove all the untested edge cases, of course!

Doing this will prevent various things from working, and nobody is
likely to have ready answers for how it breaks.  For instance,
updating or deleting from the fts3 table will have unexpected results
(it needs the original document to update the index), phrase or near
searches won't work (but might claim to work, with empty results), and
the snippet/offset code won't work (again, probably will just show
empty results).

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full text search FTS3 of files

2010-10-17 Thread Dami Laurent (PJ)
>Is it possible to use FTS3 for search without storing the actual file
>contents/search terms/keywords in a row. In other words, create a FTS3
>tables with rows that only contains an ID and populate the B-Tree with
>keywords for search.
>

Each FTS3 table t is stored internally within three regular tables :
t_content, t_segments and t_segdir. The last two tables contain the
fulltext index. The first table t_content stores the complete documents
being indexed, and is only used when you call the offsets() or
snippets() functions. So if you don't need those functions, you can
cheat : a) call FTS3 to index your document as usual; b) do an update on
the t_content table to remove the document text. 

I did play with that scenario, and gained quite a lot of disk space;
however it's really a hack and maybe wouldn't work in future versions of
SQLite. 
More on
http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite/Cookbook.pod#Spari
ng_database_disk_space 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full text search FTS3 of files

2010-10-17 Thread Max Vlasov
On Sun, Oct 17, 2010 at 11:54 PM, pipilu  wrote:

>
> My question is:
> Is it possible to use FTS3 for search without storing the actual file
> contents/search terms/keywords in a row. In other words, create a FTS3
> tables with rows that only contains an ID and populate the B-Tree with
> keywords for search.
>
>

John, technically if you ask " without storing", the answer is no. But the
way of how you could implement this depends on what you want from your
search. If only keyword search (without phrases or complex queries), then
it's a simple task: create two tables (keywords and index) and develop a
simple parser (you don't really want the power of fts3 here)

But if you want phrases, you have to provide ordering information about your
words. In this case you can use fts3 for the search and the only drawback is
that fts will keep the copy of your texts. But my experience showed that
fts3 index was implemented very effectively. I have my own implementation of
full-text search made with general sqlite tables and I compared a real data
for both and even if the texts are excluded from fts3, the index will take
twice as lower space for the same pool of articles. So there's a real chance
that even if you implement something that doesn't store the texts, you will
end up with a bigger index

Max Vlasov
maxerist.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full text search FTS3 of files

2010-10-17 Thread P Kishor
On Sun, Oct 17, 2010 at 2:54 PM, pipilu  wrote:
> Hi:
> I am trying to build a sqlite3 database to index files. What I want to do is
> to keep the files in the file system on the disk (not in the database) and
> index the files with keywords such that when a search is performed, the
> right file names are returned.
>
> My question is:
> Is it possible to use FTS3 for search without storing the actual file
> contents/search terms/keywords in a row. In other words, create a FTS3
> tables with rows that only contains an ID and populate the B-Tree with
> keywords for search.

No.

Use something like e-Swish, or htdig

>
> Thanks a lot
> John
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Full text search FTS3 of files

2010-10-17 Thread pipilu
Hi:
I am trying to build a sqlite3 database to index files. What I want to do is
to keep the files in the file system on the disk (not in the database) and
index the files with keywords such that when a search is performed, the
right file names are returned.

My question is:
Is it possible to use FTS3 for search without storing the actual file
contents/search terms/keywords in a row. In other words, create a FTS3
tables with rows that only contains an ID and populate the B-Tree with
keywords for search.

Thanks a lot
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users