Re: [Ferret-talk] Ferret and non latin characters support

David Balmain Mon, 16 Apr 2007 06:42:30 -0700

On 4/9/07, Reza Yeganeh <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
> > I'm afraid I have no experience with Persian text. If you send me an
> > example of some text I'll have a look and see what I can do.
>
> Hi David,
> This is not specific to Persian as I tested with more languages (Hebrew,
> Japanese...). By the way this is a persian sample:
> شکرشکن شوند همه طوطیان هند. زین قند پارسی که به بنگاله میرود.


Hi Reza,

Here is my test code;

    require 'rubygems'
    require 'ferret'

    text = "شکرشکن شوند همه طوطیان هند. زین قند پارسی که به بنگاله میرود."
    include Ferret::Analysis
    tokenizer = StandardAnalyzer.new.token_stream(:field, text)
    while token = tokenizer.next
      puts token
    end

And this is what I got as the output;

    token["شکرشکن":0:12:1]
    token["شوند":13:21:1]
    token["همه":22:28:1]
    token["طوطیان":29:41:1]
    token["هند":42:48:1]
    token["زین":50:56:1]
    token["قند":57:63:1]
    token["پارسی":64:74:1]
    token["که":75:79:1]
    token["به":80:84:1]
    token["بنگاله":85:97:1]
    token["میرود":98:108:1]

I guess this is probably the same as what you got but I'm not exactly
sure what is wrong with it. If you could explain what it should be
doing then I may be able to work out what is wrong.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Ferret and non latin characters support

Reply via email to