[sqlite] Porter Stemmer
Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.comwrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
Thanks, Richard. That's good to know because I am trying to decide whether to add a new tokenizer with some custom processing, as opposed to using the built in stemmer. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the rebuild command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations? Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall phi...@blastbay.comwrote: I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the rebuild command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations? If you change your tokenizer, you need to retokenize all of the source text. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
I understand that, but let's say that I already have a virtual fts table created that I set to use the Porter tokenizer, how then would I go about rebuilding and retokenizing this table with the simple tokenizer at a later time? Would I need to create an entirely new table? What I'm wondering is basically how I might take an existing fts virtual table, change its tokenizer and then rebuild the index? Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 3:14 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall phi...@blastbay.com wrote: I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the rebuild command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations? If you change your tokenizer, you need to retokenize all of the source text. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
On Fri, Jun 15, 2012 at 9:26 AM, Philip Bennefall phi...@blastbay.comwrote: I understand that, but let's say that I already have a virtual fts table created that I set to use the Porter tokenizer, how then would I go about rebuilding and retokenizing this table with the simple tokenizer at a later time? Would I need to create an entirely new table? What I'm wondering is basically how I might take an existing fts virtual table, change its tokenizer and then rebuild the index? Yes. You'll need to DROP or RENAME the original table, then CREATE the new one. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 3:14 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall phi...@blastbay.com wrote: I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the rebuild command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations? If you change your tokenizer, you need to retokenize all of the source text. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Porter Stemmer
Understood. Thank you very much for your quick help. Now I have all the information I need to get coding. And thanks once again for a great library! Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 3:39 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 9:26 AM, Philip Bennefall phi...@blastbay.com wrote: I understand that, but let's say that I already have a virtual fts table created that I set to use the Porter tokenizer, how then would I go about rebuilding and retokenizing this table with the simple tokenizer at a later time? Would I need to create an entirely new table? What I'm wondering is basically how I might take an existing fts virtual table, change its tokenizer and then rebuild the index? Yes. You'll need to DROP or RENAME the original table, then CREATE the new one. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 3:14 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 9:00 AM, Philip Bennefall phi...@blastbay.com wrote: I had another quick question. If I have built an fts table using the stemmer tokenizer, and then I later decide that I want to change to the simple one, is there an easy way to do this? I see the rebuild command, can I somehow tell that to change the tokenizer as well? I see the reference to custom ones, but what about the internal implementations? If you change your tokenizer, you need to retokenize all of the source text. Kind regards, Philip Bennefall - Original Message - From: Richard Hipp To: phi...@blastbay.com ; General Discussion of SQLite Database Sent: Friday, June 15, 2012 1:03 PM Subject: Re: [sqlite] Porter Stemmer On Fri, Jun 15, 2012 at 5:51 AM, Philip Bennefall phi...@blastbay.com wrote: Hi all, Is the algorithm used in the stemming tokenizer in SqLite's fts extension equivalent to the C implementation found at http://tartarus.org/~martin/PorterStemmer/ The built-in Porter stemmer is a copy/paste from the above link. ? I am asking this because some sources say that there are improved versions of this algorithm released much later than 2000/2001. Does SqLite's implementation differ in any significant ways from the C implementation found at the above URL? Kind regards, Philip Bennefall ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users