On Fri, Jul 6, 2012 at 10:06 AM, Jan E. <[email protected]> wrote:
> Hi,
>
> Joao Silva wrote in post #1067618:
>> As I can implement the above?
>
> For large text you may use String#scan, which has the advantage of not
> collecting all words in an array like String#split does:
word_count = 0
input_text.scan(/\w+/){ word_count += 1}
> input_text = 'This is a sentence.'
> word_count = input_text.strip.scan(/\s+/).size + 1
I don't think this usage of #scan is a good approach, because it will
yield totally wrong results:
irb(main):002:0> input_text = '. : & #'
=> ". : & #"
irb(main):003:0> input_text.strip.scan(/\s+/).size + 1
=> 4
Whereas positive matching sequences of word characters is much closer
to the reality:
irb(main):004:0> input_text.scan(/\w+/).size
=> 0
> But like Jesus already said, this simple approach will not always work.
> If the "words" in your text may contain whitespace, then looking for
> whitespace will obviously fail. You'll have to use a dictionary in this
> case. This would also cover errors (missing or superfluous whitespace).
It's crucial to clarify the definition of "word", I agree.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
-- You received this message because you are subscribed to the Google Groups
ruby-talk-google group. To post to this group, send email to
[email protected]. To unsubscribe from this group, send email
to [email protected]. For more options, visit this
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en