Re: [sqlite] Full text search FTS3 of files
Take a look at the custom tokenizer API. I think tokens returned don't necessarily have to be substrings of the text. So, maybe the text you "tokenize" could be the file path, but the tokens could be things you pull from the contents of the file. Just a thought, Cheers, Sam ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full text search FTS3 of files
On Sun, Oct 17, 2010 at 11:13 PM, Dami Laurent (PJ) wrote: >>Is it possible to use FTS3 for search without storing the actual file >>contents/search terms/keywords in a row. In other words, create a FTS3 >>tables with rows that only contains an ID and populate the B-Tree with >>keywords for search. > > Each FTS3 table t is stored internally within three regular tables : > t_content, t_segments and t_segdir. The last two tables contain the > fulltext index. The first table t_content stores the complete documents > being indexed, and is only used when you call the offsets() or > snippets() functions. So if you don't need those functions, you can > cheat : a) call FTS3 to index your document as usual; b) do an update on > the t_content table to remove the document text. If you do this, it is probably safest to replace the columns in the content table with empty strings, rather than deleting them entirely. It won't remove all the untested edge cases, of course! Doing this will prevent various things from working, and nobody is likely to have ready answers for how it breaks. For instance, updating or deleting from the fts3 table will have unexpected results (it needs the original document to update the index), phrase or near searches won't work (but might claim to work, with empty results), and the snippet/offset code won't work (again, probably will just show empty results). -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full text search FTS3 of files
>Is it possible to use FTS3 for search without storing the actual file >contents/search terms/keywords in a row. In other words, create a FTS3 >tables with rows that only contains an ID and populate the B-Tree with >keywords for search. > Each FTS3 table t is stored internally within three regular tables : t_content, t_segments and t_segdir. The last two tables contain the fulltext index. The first table t_content stores the complete documents being indexed, and is only used when you call the offsets() or snippets() functions. So if you don't need those functions, you can cheat : a) call FTS3 to index your document as usual; b) do an update on the t_content table to remove the document text. I did play with that scenario, and gained quite a lot of disk space; however it's really a hack and maybe wouldn't work in future versions of SQLite. More on http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite/Cookbook.pod#Spari ng_database_disk_space ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full text search FTS3 of files
On Sun, Oct 17, 2010 at 11:54 PM, pipilu wrote: > > My question is: > Is it possible to use FTS3 for search without storing the actual file > contents/search terms/keywords in a row. In other words, create a FTS3 > tables with rows that only contains an ID and populate the B-Tree with > keywords for search. > > John, technically if you ask " without storing", the answer is no. But the way of how you could implement this depends on what you want from your search. If only keyword search (without phrases or complex queries), then it's a simple task: create two tables (keywords and index) and develop a simple parser (you don't really want the power of fts3 here) But if you want phrases, you have to provide ordering information about your words. In this case you can use fts3 for the search and the only drawback is that fts will keep the copy of your texts. But my experience showed that fts3 index was implemented very effectively. I have my own implementation of full-text search made with general sqlite tables and I compared a real data for both and even if the texts are excluded from fts3, the index will take twice as lower space for the same pool of articles. So there's a real chance that even if you implement something that doesn't store the texts, you will end up with a bigger index Max Vlasov maxerist.net ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full text search FTS3 of files
On Sun, Oct 17, 2010 at 2:54 PM, pipilu wrote: > Hi: > I am trying to build a sqlite3 database to index files. What I want to do is > to keep the files in the file system on the disk (not in the database) and > index the files with keywords such that when a search is performed, the > right file names are returned. > > My question is: > Is it possible to use FTS3 for search without storing the actual file > contents/search terms/keywords in a row. In other words, create a FTS3 > tables with rows that only contains an ID and populate the B-Tree with > keywords for search. No. Use something like e-Swish, or htdig > > Thanks a lot > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu --- Assertions are politics; backing up assertions with evidence is science === ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Full text search FTS3 of files
Hi: I am trying to build a sqlite3 database to index files. What I want to do is to keep the files in the file system on the disk (not in the database) and index the files with keywords such that when a search is performed, the right file names are returned. My question is: Is it possible to use FTS3 for search without storing the actual file contents/search terms/keywords in a row. In other words, create a FTS3 tables with rows that only contains an ID and populate the B-Tree with keywords for search. Thanks a lot John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users