> On 7 Jun 2023, at 12:13 AM, Peter Eisentraut <pe...@eisentraut.org> wrote: > > On 03.06.23 19:47, Florents Tselai wrote: >> There’s another previous relevant patch [0] but was never merged. I’ve >> included these stop words and added some more (info in README.md). >> For my personal projects looks like it yields much better results. >> I’d like some feedback on the extension ; particularly on the installation >> infra (I’m not sure I’ve handled properly the permissions in the .sql files) >> I’ll then try to make a .patch for this. > > The open question at the previous attempt was that it wasn't clear what the > upstream source or long-term maintenance of the stop words list would be. If > it's just a personally composed list, then it's okay if you use it yourself, > but for including it into PostgreSQL it ought to come from a reputable > non-individual source like snowball.
I’ve used the NLTK list [0] as my base of stopwords; Wouldn’t this be considered reputable enough ? 0 https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/stopwords.zip (see greek.stop file in the archive) >