Re: Default stop word list

2016-09-09 Thread Emir Arnautovic
I would partially agree with Walter - having more resources allows us to include stopwords in index and let scoring model do its job. However, there are other Solr features that can suffer from that approach: e.g. if you use edismax and mm=80%, in case of query with stopwords, you can end up wi

Re: Default stop word list

2016-09-08 Thread Walter Underwood
I recommend that you remove StopFilterFactor from every analysis chain. In the tf.idf scoring model, rare words are automatically weighted more than common words. I have an index with 11.6 million documents. “the” occurs in 9.9 million of those documents. “cat” occurs in 16,000 of those documen

Re: Default stop word list

2016-09-08 Thread Steven White
Hi Walter and all. Sorry for the late reply, I was out of town. Are you saying the list of stop words from the stop word file be remove? I understand the issues I will run into because of the stop word list, but all alone, my understanding of stop word list being in the stop word file is -- to e

Re: Default stop word list

2016-08-29 Thread Walter Underwood
Do not remove stop words. Want to search for “vitamin a”? That won’t work. Stop word removal is a hack left over from when we were running search engines in 64 kbytes of memory. Yes, common words are less important for search, but removing them is a brute force approach with severe side effects

Re: Default stop word list

2016-08-29 Thread Steven White
Thanks Shawn. This is the best answer I have seen, much appreciated. A follow up question, I want to remove stop words from the list, but if I do, then search quality will degradation (and index size will grow (less of an issue)). For example, if I remove "a", then if someone search for "For a F

Re: Default stop word list

2016-08-27 Thread Shawn Heisey
On 8/27/2016 12:39 PM, Shawn Heisey wrote: > I personally think that stopword removal is more of a problem than a > solution. There actually is one thing that a stopword filter can dothat has little to do with the purpose it was designed for. You can make it impossible to search for certain words

Re: Default stop word list

2016-08-27 Thread Shawn Heisey
On 8/26/2016 7:13 AM, Steven White wrote: > But what about the current "default" list that comes with Solr? How was > that list, for all supported languages, determined? That list of stopwords was created from years of history with Lucene, taking the expertise of many people and the wisdom of the

Re: Default stop word list

2016-08-26 Thread Steven White
But what about the current "default" list that comes with Solr? How was that list, for all supported languages, determined? What I fear is this, when someone puts Solr into production, no one makes a change to that list, so if the list is not "valid" this will impacting search, but if the list is

RE: Default stop word list

2016-08-25 Thread Srinivasa Meenavalli
Hi Steven, List of Stopwords of a language are not fixed, there is no single universal list of stop words used by all natural language processing tools . Ideally stop words should be defined search merchandisers based on their domain instead of referring default. https://en.wikipedia.org/wiki/S