Re: Remove HTML from String? --> a better method

Robert Klemme Wed, 13 Jun 2012 02:36:10 -0700

On Tue, Jun 12, 2012 at 4:39 AM, Daniel P. C. <[email protected]> wrote:
> I hate regex.  I've written some ruby functions to remove html tags in
> blocks and not just special characters... also rules for swapping html
> code for anything else is included.  Example <br> will be swapped out
> for \n with existing rules.  My code is available at
> https://github.com/6ftDan/regex-is-evil


What a mess.  This is extremely inefficient.  You create new strings
all the time.  You go over the string multiple times.  You do not pass
start and end index down to strip_seq().  There is no test which
ensures start index is lower than end index (try with string ">foo<").

I'd prefer a regexp solution anytime.  It's likely faster and easier
to read - for me at least.  Btw. /x goes a long way at making a regexp
more readable - you can even include comments.  Just a simple example:
https://gist.github.com/2923072

But proper tool is of course a HTML parser like Nokogiri.

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- You received this message because you are subscribed to the Google Groups 
ruby-talk-google group. To post to this group, send email to 
[email protected]. To unsubscribe from this group, send email 
to [email protected]. For more options, visit this 
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en

Re: Remove HTML from String? --> a better method

Reply via email to