Thanks mark.
Take the comma out of: | <#P: ("_"|"-"|"/"|"."|",") > in the .jj file
Its working for me...
- Bhavin pandya
----- Original Message -----
From: "Mark Miller" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Monday, September 17, 2007 8:34 PM
Subject: Re: How to tokenize with comma in standard tokenizer
Take the comma out of: | <#P: ("_"|"-"|"/"|"."|",") > in the .jj file
(around line 92). Keep in mind that this will affect being able to find
tokens that where previously indexed with the comma there (obviously). I
believe the javacc target in the build file will rebuild...you need to get
javacc and put a prop file next to the build file called build.properties
that contains: javacc.home=c:/javacc (or wherever you put javacc).
Also, you could consider trying to pre-process the strings (replace the
comma with a space or something).
- Mark
Bhavin Pandya wrote:
Hi,
Standard tokenizer works pretty well for me... but i found one problem
with my usage...
I want to tokenize..."TheRing6,Proposal6,GuyandGirl6" as a three saparate
tokens.. while standard analyzer considering it as a one word because it
has one digit in token.
Expected three tokens:
1. thering6
2. proposal6
3. guyandgirl6
i want to change this behaviour of standard tokenizer for this
purpose.... But i dont know where to change....
Do i need to comment some rule in StandardTokenizer.jj file ??? I am
confused with this file....
Any pointer...
- Bhavin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]