RE: Using Solr Analyzers in Lucene

2010-10-05 Thread Mathias Walter
Hi Max,

why don't you use WordDelimiterFilterFactory directly? I'm doing the same
stuff inside my own analyzer:

final MapString, String args = new HashMapString, String();

args.put(generateWordParts, 1);
args.put(generateNumberParts, 1);
args.put(catenateWords, 0);
args.put(catenateNumbers, 0);
args.put(catenateAll, 0);
args.put(splitOnCaseChange, 1);
args.put(splitOnNumerics, 1);
args.put(preserveOriginal, 1);
args.put(stemEnglishPossessive, 0);
args.put(language, English);

wordDelimiter = new WordDelimiterFilterFactory();
wordDelimiter.init(args);
stream = wordDelimiter.create(stream);

--
Kind regards,
Mathias

 -Original Message-
 From: Max Lynch [mailto:ihas...@gmail.com]
 Sent: Tuesday, October 05, 2010 1:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Using Solr Analyzers in Lucene
 
 I have made progress on this by writing my own Analyzer.  I basically
added
 the TokenFilters that are under each of the solr factory classes.  I had
to
 copy and paste the WordDelimiterFilter because, of course, it was package
 protected.
 
 
 
 On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote:
 
  Hi,
  I asked this question a month ago on lucene-user and was referred here.
 
  I have content being analyzed in Solr using these tokenizers and
filters:
 
  fieldType name=text_standard class=solr.TextField
  positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
 
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory
language=English
  protected=protwords.txt/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory
language=English
  protected=protwords.txt/
/analyzer
  /fieldType
 
  Basically I want to be able to search against this index in Lucene with
one
  of my background searching applications.
 
  My main reason for using Lucene over Solr for this is that I use the
  highlighter to keep track of exactly which terms were found which I use
for
  my own scoring system and I always collect the whole set of found
  documents.  I've messed around with using Boosts but it wasn't fine
grained
  enough and I wasn't able to effectively create a score threshold (would
  creating my own scorer be a better idea?)
 
  Is it possible to use this analyzer from Lucene, or at least re-create
it
  in code?
 
  Thanks.
 
 



Re: Using Solr Analyzers in Lucene

2010-10-05 Thread Max Lynch
I guess I missed the init() method.  I was looking at the factory and
thought I saw config loading stuff (like getInt) which I assumed meant it
need to have schema.xml available.

Thanks!

-Max

On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter mathias.wal...@gmx.netwrote:

 Hi Max,

 why don't you use WordDelimiterFilterFactory directly? I'm doing the same
 stuff inside my own analyzer:

 final MapString, String args = new HashMapString, String();

 args.put(generateWordParts, 1);
 args.put(generateNumberParts, 1);
 args.put(catenateWords, 0);
 args.put(catenateNumbers, 0);
 args.put(catenateAll, 0);
 args.put(splitOnCaseChange, 1);
 args.put(splitOnNumerics, 1);
 args.put(preserveOriginal, 1);
 args.put(stemEnglishPossessive, 0);
 args.put(language, English);

 wordDelimiter = new WordDelimiterFilterFactory();
 wordDelimiter.init(args);
 stream = wordDelimiter.create(stream);

 --
 Kind regards,
 Mathias

  -Original Message-
  From: Max Lynch [mailto:ihas...@gmail.com]
  Sent: Tuesday, October 05, 2010 1:03 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Using Solr Analyzers in Lucene
 
  I have made progress on this by writing my own Analyzer.  I basically
 added
  the TokenFilters that are under each of the solr factory classes.  I had
 to
  copy and paste the WordDelimiterFilter because, of course, it was package
  protected.
 
 
 
  On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote:
 
   Hi,
   I asked this question a month ago on lucene-user and was referred here.
  
   I have content being analyzed in Solr using these tokenizers and
 filters:
  
   fieldType name=text_standard class=solr.TextField
   positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
  
   filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English
   protected=protwords.txt/
 /analyzer
   /fieldType
  
   Basically I want to be able to search against this index in Lucene with
 one
   of my background searching applications.
  
   My main reason for using Lucene over Solr for this is that I use the
   highlighter to keep track of exactly which terms were found which I use
 for
   my own scoring system and I always collect the whole set of found
   documents.  I've messed around with using Boosts but it wasn't fine
 grained
   enough and I wasn't able to effectively create a score threshold (would
   creating my own scorer be a better idea?)
  
   Is it possible to use this analyzer from Lucene, or at least re-create
 it
   in code?
  
   Thanks.
  
  




Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
Hi,
I asked this question a month ago on lucene-user and was referred here.

I have content being analyzed in Solr using these tokenizers and filters:

fieldType name=text_standard class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
/fieldType

Basically I want to be able to search against this index in Lucene with one
of my background searching applications.

My main reason for using Lucene over Solr for this is that I use the
highlighter to keep track of exactly which terms were found which I use for
my own scoring system and I always collect the whole set of found
documents.  I've messed around with using Boosts but it wasn't fine grained
enough and I wasn't able to effectively create a score threshold (would
creating my own scorer be a better idea?)

Is it possible to use this analyzer from Lucene, or at least re-create it in
code?

Thanks.


Re: Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
I have made progress on this by writing my own Analyzer.  I basically added
the TokenFilters that are under each of the solr factory classes.  I had to
copy and paste the WordDelimiterFilter because, of course, it was package
protected.



On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch ihas...@gmail.com wrote:

 Hi,
 I asked this question a month ago on lucene-user and was referred here.

 I have content being analyzed in Solr using these tokenizers and filters:

 fieldType name=text_standard class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
   /analyzer
 /fieldType

 Basically I want to be able to search against this index in Lucene with one
 of my background searching applications.

 My main reason for using Lucene over Solr for this is that I use the
 highlighter to keep track of exactly which terms were found which I use for
 my own scoring system and I always collect the whole set of found
 documents.  I've messed around with using Boosts but it wasn't fine grained
 enough and I wasn't able to effectively create a score threshold (would
 creating my own scorer be a better idea?)

 Is it possible to use this analyzer from Lucene, or at least re-create it
 in code?

 Thanks.