Solr can not index F**K!

2011-07-31 Thread randohi
One of our clients (a hot girl!) brought this to our attention: 
In this document there are many f* words:

http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm

and we have indexed it with latest version of Solr (ver 3.3). But, we if we
search F**K, it does not return the document back!

We have tried to index it with different text types, but still not working.

Any idea why F* can not be indexed - being censored by the government? :D


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
That seems a little far fetched, have you checked your analysis?

François

On Jul 31, 2011, at 4:58 PM, randohi wrote:

 One of our clients (a hot girl!) brought this to our attention: 
 In this document there are many f* words:
 
 http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm
 
 and we have indexed it with latest version of Solr (ver 3.3). But, we if we
 search F**K, it does not return the document back!
 
 We have tried to index it with different text types, but still not working.
 
 Any idea why F* can not be indexed - being censored by the government? :D
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr can not index F**K!

2011-07-31 Thread Shashi Kant
Check your Stop words list
On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com
wrote:
 That seems a little far fetched, have you checked your analysis?

 François

 On Jul 31, 2011, at 4:58 PM, randohi wrote:

 One of our clients (a hot girl!) brought this to our attention:
 In this document there are many f* words:

 http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm

 and we have indexed it with latest version of Solr (ver 3.3). But, we if
we
 search F**K, it does not return the document back!

 We have tried to index it with different text types, but still not
working.

 Any idea why F* can not be indexed - being censored by the government? :D


 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
Indeed, the analysis will show if the term is a stop word, the term gets 
removed by the stop filter, turning on verbose output shows that.

François

On Jul 31, 2011, at 6:27 PM, Shashi Kant wrote:

 Check your Stop words list
 On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com
 wrote:
 That seems a little far fetched, have you checked your analysis?
 
 François
 
 On Jul 31, 2011, at 4:58 PM, randohi wrote:
 
 One of our clients (a hot girl!) brought this to our attention:
 In this document there are many f* words:
 
 http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm
 
 and we have indexed it with latest version of Solr (ver 3.3). But, we if
 we
 search F**K, it does not return the document back!
 
 We have tried to index it with different text types, but still not
 working.
 
 Any idea why F* can not be indexed - being censored by the government? :D
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Solr can not index F**K!

2011-07-31 Thread randohi
It is not in stopwords list. The file was indexed with the following setting:

Field Type: text_en

Properties: Indexed, Tokenized, Stored, Multivalued

Schema: Indexed, Tokenized, Stored, Multivalued

Index: Indexed, Tokenized, Stored

Position Increment Gap: 100

Index Analyzer: org.apache.solr.analysis.TokenizerChain Details

Tokenizer Class: org.apache.solr.analysis.StandardTokenizerFactory

Filters:

   1. org.apache.solr.analysis.StopFilterFactory args:{words:
stopwords_en.txt ignoreCase: true enablePositionIncrements: true
luceneMatchVersion: LUCENE_33 }
   2. org.apache.solr.analysis.LowerCaseFilterFactory
args:{luceneMatchVersion: LUCENE_33 }
   3. org.apache.solr.analysis.EnglishPossessiveFilterFactory
args:{luceneMatchVersion: LUCENE_33 }
   4. org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected:
protwords.txt luceneMatchVersion: LUCENE_33 }
   5. org.apache.solr.analysis.PorterStemFilterFactory
args:{luceneMatchVersion: LUCENE_33 }

There are 1376 tokens indexed, and f**k is not one of them. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr can not index F**K!

2011-07-31 Thread Michael Sokolov

On 7/31/2011 7:29 PM, randohi wrote:

org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected:
protwords.txt luceneMatchVersion: LUCENE_33 }

Could something be going on here?  What's in your protwords.txt ?

-Mike


Re: Solr can not index F**K!

2011-07-31 Thread randohi
The content of protwords.txt:

dontstems
zwhacky


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214525.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr can not index F**K!

2011-07-31 Thread Mark Mandel
I hate to be the PC guy, but seriously, did this have to be said?

On Mon, Aug 1, 2011 at 6:58 AM, randohi rand...@lawyer.com wrote:

 One of our clients (a hot girl!)




-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) + Flex - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au


Re: Solr can not index F**K!

2011-07-31 Thread Chris Hostetter

: http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm

My money is on maxFieldLength in solrconfig.xml...

http://www.lucidimagination.com/search/?q=maxFieldLength#/p:solr



-Hoss


Re: Solr can not index F**K!

2011-07-31 Thread randohi
Yes, that is where the problem comes! Many many thanks to everyone!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214666.html
Sent from the Solr - User mailing list archive at Nabble.com.