Re: How to tokenize with comma in standard tokenizer

Bhavin Pandya Tue, 18 Sep 2007 03:58:43 -0700

Thanks mark.

Take the comma out of: | <#P: ("_"|"-"|"/"|"."|",") > in the .jj file


Its working for me...

- Bhavin pandya

----- Original Message -----From: "Mark Miller" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Monday, September 17, 2007 8:34 PM
Subject: Re: How to tokenize with comma in standard tokenizer

Take the comma out of: | <#P: ("_"|"-"|"/"|"."|",") > in the .jj file(around line 92). Keep in mind that this will affect being able to findtokens that where previously indexed with the comma there (obviously). Ibelieve the javacc target in the build file will rebuild...you need to getjavacc and put a prop file next to the build file called build.propertiesthat contains: javacc.home=c:/javacc (or wherever you put javacc).
Also, you could consider trying to pre-process the strings (replace thecomma with a space or something).
- Mark

Bhavin Pandya wrote:
Hi,
Standard tokenizer works pretty well for me... but i found one problemwith my usage...
I want to tokenize..."TheRing6,Proposal6,GuyandGirl6" as a three saparatetokens.. while standard analyzer considering it as a one word because ithas one digit in token.
Expected three tokens:
1. thering6
2. proposal6
3. guyandgirl6
i want to change this behaviour of standard tokenizer for thispurpose.... But i dont know where to change....Do i need to comment some rule in StandardTokenizer.jj file ??? I amconfused with this file....
Any pointer...

- Bhavin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to tokenize with comma in standard tokenizer

Reply via email to