I think all you have to do is write your own Analyzer.
You can copy one of the supplied ones, and remove the piece that calls
isLetter(char) or some similar function.  That may be in
StandardTokenizer, I can't look at the code now to confirm.
If you want to thread certain fields differently (e.g. exception to the
rule) you can see an example of such an Analyzer in jGuru's Lucene FAQ.

Good luck,
Otis

--- Terry Steichen <[EMAIL PROTECTED]> wrote:
> Yes, Otis - that does help.  But a little more advice would help even
> more.
> 
> For example, I'm currently using the standard Lucene code without any
> customization.  That means I am using StandardAnalyzer.  Internally,
> what
> StandardAnalyzer does is (1) create a StandardTokenizer, (2)
> StandardFilter,
> (3) LowerCaseFilter, and (4) StopFilter.  StandardTokenizer is
> generated
> from StandardTokenizer.jj, but when generated, it extends Tokenizer.
> 
> Now WhitespaceAnalyzer (which you've mentioned several times) creates
> a
> WhitespaceTokenizer (which in turn extends CharTokenizer, which
> extends
> Tokenizer).
> 
> This all makes me a bit dizzy, since I don't really understand (and
> hope I
> don't have to learn) all the internal Lucene architecture.  It would
> help
> enormously if you could tell me precisely I have to do to make the
> escape
> character work with all the functionality of StandardAnalyzer
> retained.  The
> WhitespaceAnalyzer - should it be used in lieu of the
> StandardTokenizer?  If
> so, would any functionality be lost?  (It seems like it would lose a
> ton of
> functionality to me.)  Would it be better to modify
> StandardTokenizer.jj,
> and if so, where/how?
> 
> TIA,
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Tuesday, November 26, 2002 6:45 PM
> Subject: Re: Does Escaping Really Work?
> 
> 
> > Documentation is not detailed enough.
> > Analyzers analyze their input (at indexing and searching time).
> > They are just Java classes that do not know about QueryParser.jj,
> which
> > is the only place where '\' is defined as an escape characters
> (plus
> > the .java files generated by running QueryParser.jj through
> JavaCC).
> > Hence, I believe that if your Analyzer is not explicitly instructed
> to
> > leave '\' alone you will think that escaping doesn't work.
> > Whitespace analyzer I believe works because it doesn't throw out
> > characters like '\', as I think it only splits token on spaces.
> >
> > HTH.
> > Otis
> >
> >
> > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > Dave,
> > >
> > > I would say you seem to be right.  But this is getting very
> > > frustrating.
> > > Here is what the Lucene docs say:
> > >
> > > <docs quote>
> > > Lucene supports escaping special characters that are part of the
> > > query
> > > syntax. The current list special characters are
> > >
> > > + - && || ! ( ) { } [ ] ^ " ~ * ? : \
> > >
> > > To escape these character use the \ before the character. For
> example
> > > to
> > > search for (1+1):2 use the query:
> > >
> > >  \(1\+1\)\:2
> > >
> > > </docs quote>
> > >
> > > Is the Lucene documentation in error?  Does it work but only
> using
> > > something
> > > other than the standard configuration?  If so, precisely what
> > > non-standard
> > > configuration is necessary?
> > >
> > > Why can't these questions be answered simply and clearly?
> > >
> > > Terry
> > >
> > >
> > > ----- Original Message -----
> > > From: "Spencer, Dave" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > Sent: Tuesday, November 26, 2002 5:02 PM
> > > Subject: RE: Does Escaping Really Work?
> > >
> > >
> > > My understanding is that "escaping may not work (as Terry and I
> > > believe)
> > > however
> > >  a workaround for most 'reasonable' cases is to use
> > > WhitespaceAnalyzer
> > > when
> > > parsing a query".
> > >
> > >
> > > -----Original Message-----
> > > From: Terry Steichen [mailto:[EMAIL PROTECTED]]
> > > Sent: Tuesday, November 26, 2002 1:48 PM
> > > To: Lucene Users List
> > > Subject: Re: Does Escaping Really Work?
> > >
> > >
> > > Well, pardon me for breathing, Otis.
> > >
> > > I didn't make the connection (partly 'cause you changed the
> subject
> > > line).
> > > But anyway, I don't understand your rather oblique answer - does
> > > escaping
> > > work or not?  Are you saying that, in order for it to work (the
> way
> > > the
> > > docs
> > > say it does), I need to insert this module in the chain? Or what?
> > >
> > > Terry
> > >
> > > ----- Original Message -----
> > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > Sent: Tuesday, November 26, 2002 3:07 PM
> > > Subject: Re: Does Escaping Really Work?
> > >
> > >
> > > > Didn't I just answer this last night?
> > > > WhitespaceAnalyzer?
> > > >
> > > > Otis
> > > >
> > > > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > > > I'm confused about how to use escape characters in Lucene. 
> My
> > > Lucene
> > > > > configuration is 1.3-dev1 and I use the StandardAnalyzer and
> > > > > QueryParser.
> > > > >
> > > > > My documents have a field called 'path' with a value like
> > > > > "1102/a55407-2002nov2.xml".  This field is indexed but not
> > > tokenized.
> > > > >  Here are the various queries I've tried and their results:
> > > > >
> > > > > 1) When a dash is included in the query, Lucene interprets
> this
> > > as a
> > > > > space. ("path:1102/a55402-2002nov2.xml" is interpreted as
> > > > > "path:1102/a55402 -body:2002nov2.xml")
> > > > >
> > > > > 2) When a backslash is inserted before the dash (and the
> query
> > > does
> > > > > *not* contain a wildcard), Lucene interprets this by
> inserting a
> > > > > space in lieu of the next character.
> > > > > ('path:1102/a55402\-2002nov2.xml' interpreted as
> > > 'path:"1102/a55402
> > > > > 2002nov2.xml" [note the space where the dash was]')
> > > > >
> > > > > 3) When a backslash is inserted before the dash (and the
> query
> > > *does*
> > > > > contain a wildcard), Lucene interprets this literally,
> without
> > > any
> > > > > conversion. ("path:1102/55407\-2002nov*" is interpreted
> > > literally).
> > > > >
> > > > > 4) When a backslash is inserted before the dash and
> immediately
> > > > > followed by a wildcard, Lucene reports an error.
> > > > > ('path:1102/a55407-*'    causes lexical error: Encountered
> <EOF>
> > > > > after :"")
> > > > >
> > > > > My overall observation is that it appears it is not possible
> to
> > > > > escape a dash - is this true?
> > > > >
> > > > > A previous post (yesterday) suggests that it is also not
> possible
> > > to
> > > > > escape a backslash.  If that's also true, what characters can
> be
> > > > > escaped?
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Terry
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > >
> > > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to