Take the comma out of: | <#P: ("_"|"-"|"/"|"."|",") > in the .jj file (around line 92). Keep in mind that this will affect being able to find tokens that where previously indexed with the comma there (obviously). I believe the javacc target in the build file will rebuild...you need to get javacc and put a prop file next to the build file called build.properties that contains: javacc.home=c:/javacc (or wherever you put javacc).

Also, you could consider trying to pre-process the strings (replace the comma with a space or something).

- Mark

Bhavin Pandya wrote:
Hi,

Standard tokenizer works pretty well for me... but i found one problem with my 
usage...

I want to tokenize..."TheRing6,Proposal6,GuyandGirl6" as a three saparate 
tokens.. while standard analyzer considering it as a one word because it has one digit in 
token.

Expected three tokens:
1. thering6
2. proposal6
3. guyandgirl6

i want to change this behaviour of standard tokenizer for this purpose.... But 
i dont know where to change....
Do i need to comment some rule in StandardTokenizer.jj file ???  I am confused 
with this file....

Any pointer...

- Bhavin



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to