Solr can not index F**K!
One of our clients (a hot girl!) brought this to our attention: In this document there are many f* words: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm and we have indexed it with latest version of Solr (ver 3.3). But, we if we search F**K, it does not return the document back! We have tried to index it with different text types, but still not working. Any idea why F* can not be indexed - being censored by the government? :D -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
That seems a little far fetched, have you checked your analysis? François On Jul 31, 2011, at 4:58 PM, randohi wrote: One of our clients (a hot girl!) brought this to our attention: In this document there are many f* words: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm and we have indexed it with latest version of Solr (ver 3.3). But, we if we search F**K, it does not return the document back! We have tried to index it with different text types, but still not working. Any idea why F* can not be indexed - being censored by the government? :D -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
Check your Stop words list On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com wrote: That seems a little far fetched, have you checked your analysis? François On Jul 31, 2011, at 4:58 PM, randohi wrote: One of our clients (a hot girl!) brought this to our attention: In this document there are many f* words: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm and we have indexed it with latest version of Solr (ver 3.3). But, we if we search F**K, it does not return the document back! We have tried to index it with different text types, but still not working. Any idea why F* can not be indexed - being censored by the government? :D -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
Indeed, the analysis will show if the term is a stop word, the term gets removed by the stop filter, turning on verbose output shows that. François On Jul 31, 2011, at 6:27 PM, Shashi Kant wrote: Check your Stop words list On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com wrote: That seems a little far fetched, have you checked your analysis? François On Jul 31, 2011, at 4:58 PM, randohi wrote: One of our clients (a hot girl!) brought this to our attention: In this document there are many f* words: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm and we have indexed it with latest version of Solr (ver 3.3). But, we if we search F**K, it does not return the document back! We have tried to index it with different text types, but still not working. Any idea why F* can not be indexed - being censored by the government? :D -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
It is not in stopwords list. The file was indexed with the following setting: Field Type: text_en Properties: Indexed, Tokenized, Stored, Multivalued Schema: Indexed, Tokenized, Stored, Multivalued Index: Indexed, Tokenized, Stored Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain Details Tokenizer Class: org.apache.solr.analysis.StandardTokenizerFactory Filters: 1. org.apache.solr.analysis.StopFilterFactory args:{words: stopwords_en.txt ignoreCase: true enablePositionIncrements: true luceneMatchVersion: LUCENE_33 } 2. org.apache.solr.analysis.LowerCaseFilterFactory args:{luceneMatchVersion: LUCENE_33 } 3. org.apache.solr.analysis.EnglishPossessiveFilterFactory args:{luceneMatchVersion: LUCENE_33 } 4. org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_33 } 5. org.apache.solr.analysis.PorterStemFilterFactory args:{luceneMatchVersion: LUCENE_33 } There are 1376 tokens indexed, and f**k is not one of them. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
On 7/31/2011 7:29 PM, randohi wrote: org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_33 } Could something be going on here? What's in your protwords.txt ? -Mike
Re: Solr can not index F**K!
The content of protwords.txt: dontstems zwhacky -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr can not index F**K!
I hate to be the PC guy, but seriously, did this have to be said? On Mon, Aug 1, 2011 at 6:58 AM, randohi rand...@lawyer.com wrote: One of our clients (a hot girl!) -- E: mark.man...@gmail.com T: http://www.twitter.com/neurotic W: www.compoundtheory.com cf.Objective(ANZ) + Flex - Nov 17, 18 - Melbourne Australia http://www.cfobjective.com.au
Re: Solr can not index F**K!
: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm My money is on maxFieldLength in solrconfig.xml... http://www.lucidimagination.com/search/?q=maxFieldLength#/p:solr -Hoss
Re: Solr can not index F**K!
Yes, that is where the problem comes! Many many thanks to everyone! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214666.html Sent from the Solr - User mailing list archive at Nabble.com.