On 4/9/07, Reza Yeganeh <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
> > I'm afraid I have no experience with Persian text. If you send me an
> > example of some text I'll have a look and see what I can do.
>
> Hi David,
> This is not specific to Persian as I tested with more languages (Hebrew,
> Japanese...). By the way this is a persian sample:
> شکرشکن شوند همه طوطیان هند. زین قند پارسی که به بنگاله میرود.
Hi Reza,
Here is my test code;
require 'rubygems'
require 'ferret'
text = "شکرشکن شوند همه طوطیان هند. زین قند پارسی که به بنگاله میرود."
include Ferret::Analysis
tokenizer = StandardAnalyzer.new.token_stream(:field, text)
while token = tokenizer.next
puts token
end
And this is what I got as the output;
token["شکرشکن":0:12:1]
token["شوند":13:21:1]
token["همه":22:28:1]
token["طوطیان":29:41:1]
token["هند":42:48:1]
token["زین":50:56:1]
token["قند":57:63:1]
token["پارسی":64:74:1]
token["که":75:79:1]
token["به":80:84:1]
token["بنگاله":85:97:1]
token["میرود":98:108:1]
I guess this is probably the same as what you got but I'm not exactly
sure what is wrong with it. If you could explain what it should be
doing then I may be able to work out what is wrong.
Cheers,
Dave
--
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk