RE: Using Solr Analyzers in Lucene
Hi Max, why don't you use WordDelimiterFilterFactory directly? I'm doing the same stuff inside my own analyzer: final MapString, String args = new HashMapString, String(); args.put(generateWordParts, 1); args.put(generateNumberParts, 1); args.put(catenateWords, 0); args.put(catenateNumbers, 0); args.put(catenateAll, 0); args.put(splitOnCaseChange, 1); args.put(splitOnNumerics, 1); args.put(preserveOriginal, 1); args.put(stemEnglishPossessive, 0); args.put(language, English); wordDelimiter = new WordDelimiterFilterFactory(); wordDelimiter.init(args); stream = wordDelimiter.create(stream); -- Kind regards, Mathias -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Tuesday, October 05, 2010 1:03 AM To: solr-user@lucene.apache.org Subject: Re: Using Solr Analyzers in Lucene I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote: Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Re: Using Solr Analyzers in Lucene
I guess I missed the init() method. I was looking at the factory and thought I saw config loading stuff (like getInt) which I assumed meant it need to have schema.xml available. Thanks! -Max On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter mathias.wal...@gmx.netwrote: Hi Max, why don't you use WordDelimiterFilterFactory directly? I'm doing the same stuff inside my own analyzer: final MapString, String args = new HashMapString, String(); args.put(generateWordParts, 1); args.put(generateNumberParts, 1); args.put(catenateWords, 0); args.put(catenateNumbers, 0); args.put(catenateAll, 0); args.put(splitOnCaseChange, 1); args.put(splitOnNumerics, 1); args.put(preserveOriginal, 1); args.put(stemEnglishPossessive, 0); args.put(language, English); wordDelimiter = new WordDelimiterFilterFactory(); wordDelimiter.init(args); stream = wordDelimiter.create(stream); -- Kind regards, Mathias -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Tuesday, October 05, 2010 1:03 AM To: solr-user@lucene.apache.org Subject: Re: Using Solr Analyzers in Lucene I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote: Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Using Solr Analyzers in Lucene
Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Re: Using Solr Analyzers in Lucene
I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote: Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: fieldType name=text_standard class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.