I figured "c++." would be a problem. Here's what I did to get around it:
value.toLowerCase().replaceAll("\\.( ?\t?\n?\r?)+", " ")
I'm not escaping +'s from the query so I should be good there.
thanks alot.
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruiter Support
[email protected]
P: 440.946.5214 ext 5458
F: 440.856.0312
This email and any files transmitted with it may contain confidential
information intended solely for the use of the individual or entity to whom
they are addressed. If you have received this email in error please notify the
sender. Please note that any views or opinions presented in this email are
solely those of the author and do not necessarily represent those of the
company. Finally, the recipient should check this email and any attachments for
the presence of viruses. The company accepts no liability for any damage caused
by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420
Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net
----- Original Message -----
To: [email protected], Chris Salem <[email protected]>
From: John Wang <[email protected]>
Sent: 7/16/2009 12:09:05 PM
Subject: Re: searching for c++, c#, etc...
If you escape the character + or #, the sentence:
"I know java + c++" would not skip +, furthermore, it breaks query parsing,
where + is reserved.
-John
On Thu, Jul 16, 2009 at 9:04 AM, John Wang <[email protected]> wrote:
> This runs into problems when you have such following sentence:
> "I dislike c++."
>
> If you use WSA, then last token is "c++.", not "c++", the query would not
> find this document.
>
> -John
>
>
> On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem <[email protected]>wrote:
>
>> That seems to be working. you don't have to escape the pluses though.
>> Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess
>> I could lowercase everything that gets indexed.
>> thanks alot for your help.
>> Sincerely,
>> Chris Salem
>> Development Team
>> Main Sequence Technologies, Inc.
>> PCRecruiter.net - PCRecruiter Support
>> [email protected]
>> P: 440.946.5214 ext 5458
>> F: 440.856.0312
>>
>> This email and any files transmitted with it may contain confidential
>> information intended solely for the use of the individual or entity to whom
>> they are addressed. If you have received this email in error please notify
>> the sender. Please note that any views or opinions presented in this email
>> are solely those of the author and do not necessarily represent those of the
>> company. Finally, the recipient should check this email and any attachments
>> for the presence of viruses. The company accepts no liability for any damage
>> caused by any virus transmitted by this email. Main Sequence Technologies,
>> Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net
>>
>>
>>
>>
>> ----- Original Message -----
>> To: [email protected], Chris Salem <[email protected]>
>> From: Danil TORIN <[email protected]>
>> Sent: 7/16/2009 10:28:37 AM
>> Subject: Re: searching for c++, c#, etc...
>>
>>
>> Try WhitespaceAnalyzer for both indexing and searching.
>> On search-time you may also need to escape "+", "(", ")" with "\".
>> "#" shouldn't need escaping.
>>
>> On Thu, Jul 16, 2009 at 17:23, Chris Salem<[email protected]> wrote:
>> > I'm using the StandardAnalyzer for both searching and indexing.
>> > Here's the code to parse the query:
>> > Searcher searcher = new IndexSearcher(reader);
>> > Analyzer analyzer = new StandardAnalyzer(stopwords);
>> > System.out.println(queryString);
>> > QueryParser qp = new QueryParser(searchField,analyzer);
>> > Query query = qp.parse(queryString);
>> > queryString = query.toString();
>> > System.out.println(queryString);
>> > And here's the output from the println's:
>> > r2_resume_text:c\+\+ AND r2_resume_text: c\#
>> > +r2_resume_text:c +r2_resume_text:c
>> > Also the documentation doesn't say anything about # having to be
>> escaped.
>> > Do I have to escape during indexing too?
>> > Sincerely,
>> > Chris Salem
>> >
>> >
>> >
>> > ----- Original Message -----
>> > To: [email protected], Chris Salem <[email protected]>
>> > From: Ian Lea <[email protected]>
>> > Sent: 7/16/2009 5:12:53 AM
>> > Subject: Re: searching for c++, c#, etc...
>> >
>> >
>> > Hi
>> >
>> >
>> > Escaping should work. See
>> > http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and
>> > QueryParser.escape(). And you need to be sure that your analyzer
>> > isn't removing the plus signs and that you use the same analyzer for
>> > indexing and searching.
>> >
>> > Googling for something like "lucene escape" will find you more info.
>> >
>> > Luke will tell you what is actually in your index.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Wed, Jul 15, 2009 at 5:19 PM, Chris Salem<[email protected]>
>> wrote:
>> >> Hello,
>> >> I'm trying to search for the terms like c++ but the parser is stripping
>> off the ++. I tried escaping the ++ with slashes but it's still stripping
>> it off. I could replace + with "plus", is that the best way to do it? How
>> come escaping isn't working?
>> >> thanks
>> >> Sincerely,
>> >> Chris Salem
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:[email protected]
>>
>>
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:[email protected]
>>
>>
>>
>
(The following links were included with this email:)
http://www.pcrecruiter.net/
http://www.pcrecruiter.net/support.htm
mailto:[email protected]
(The following links were included with this email:)
http://www.pcrecruiter.net/
http://www.pcrecruiter.net/support.htm
mailto:[email protected]