Re: Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread Erick Erickson
This type of question is not appropriate on the developers list, this
list is devoted to development. Please please post this kind of
question on the user's list.

As it happens, this very topic is being discussed under a thread
"Recover special terms from StandardTokenizer", that should give
you some ideas.

ERick

On Fri, Dec 11, 2009 at 11:19 AM, maxSchlein  wrote:

>
> Can someone please point me in the right direction.
>
> We are creating an application that needs to beable to search on C++ and
> get
> back doc's that have C++ in it.  The StandardAnalyzer does not seem to
> index
> the "+", so a search for "C++" will bring back docs that contain, C++, C,
> C#, etc.  The WhiteSpaceAnalyzer will index the "+", but if we have the
> term "C++." that is, if C++ is at the end of a sentence, it will index
> "C++." so a search for "C++" will not return the doc.  I have heard of
> maybe
> a CustomAnalyzer; however, it seems like there would actually need to be a
> CustomFilter/CustomTokenizer, I looked at:
>  - StandardAnalyzer.java
>  - StandardFilter.java
>  - StandardTokenizer.java
>  - StandardTokenizerImpl.java
>  - StandardTokenizerImpl.jflex
>
> I would guess that the StandardTokenizer is where the changes would need to
> be made to allow the "+" character, but I am unclear as to how.
>
> Any and all help is greatly appreciated.
> --
> View this message in context:
> http://old.nabble.com/Lucene-Analyzer-that-can-handle-C%2B%2B-vs-C--tp26747079p26747079.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread maxSchlein

Can someone please point me in the right direction.  

We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it.  The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc.  The WhiteSpaceAnalyzer will index the "+", but if we have the
term "C++." that is, if C++ is at the end of a sentence, it will index
"C++." so a search for "C++" will not return the doc.  I have heard of maybe
a CustomAnalyzer; however, it seems like there would actually need to be a
CustomFilter/CustomTokenizer, I looked at:
  - StandardAnalyzer.java 
  - StandardFilter.java 
  - StandardTokenizer.java 
  - StandardTokenizerImpl.java 
  - StandardTokenizerImpl.jflex 

I would guess that the StandardTokenizer is where the changes would need to
be made to allow the "+" character, but I am unclear as to how.

Any and all help is greatly appreciated.
-- 
View this message in context: 
http://old.nabble.com/Lucene-Analyzer-that-can-handle-C%2B%2B-vs-C--tp26747079p26747079.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org