Re: Anyone got one of these?

2007-01-26 Thread Trevor DeVore

On Jan 26, 2007, at 12:13 PM, Trevor DeVore wrote:

Well, I have one I've been working on that takes a list of things  
to strip.  You could modify it to fit your needs maybe.  The first  
version was much more compact and use matchText.  Then I stress  
tested it and it was slow and I had to call it quite often and  
with large amounts of text.  So I came up with the attached version.


Chipp - One thing I should point out about my version.  One of the  
requirements was being able to strip out attributes of the font tag  
such as color, size or lang but maintain the other attributes.  It  
then strips out any font tags that just have  (because all  
attributes were stripped).  The code might be overkill for what you  
are trying to do.


--
Trevor DeVore
Blue Mango Learning Systems - www.bluemangolearning.com
[EMAIL PROTECTED]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Anyone got one of these? [correction-typo]

2007-01-26 Thread Jim Ault

Obviously
> put item 2 of LNN & cr after newHtmlStr
should be 
 put item 2 to -1 of LNN & cr after newHtmlStr

Jim

On 1/26/07 1:20 PM, "Jim Ault" <[EMAIL PROTECTED]> wrote:

> On 1/26/07 10:56 AM, "Chipp Walters" <[EMAIL PROTECTED]> wrote:
> 
>> function stripAllTagsBut pHtml,pTagsList
>>   --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
>>   --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..
>> It's used to strip all tags from HTML but those in the pTagsList parameter.
>> 
>> IOW, it can be used to grab the HTML of a page, and strip everything but the
>> img tags.
>> 
> Do you need them in the sequencial order?
> assuming Yes
> 
> I would start with the idea of making the 'un-naughty bits' a list to be
> used in an optional repeat loop below, then
> 
> ---  start copy here
> --Short, fast, sweet.
> on test
>   put fld 1 into htmlStr --assumes sorce is in fld 1
>   replace Null with empty in htmlStr --clean out
>   --numtochar(3) works just as well
>   replace cr with "†" in htmlStr --preserve
> 
>   --repeat for each tag you want to preserve
>   replace "   
>   set the itemDel to ">"
>   repeat for each line LNN in (line 2 to -1 of htmlStr)
> put item 1 of LNN & null after newHtmlStr
> put item 2 of LNN & cr after newHtmlStr
>   end repeat
>   put newHtmlstr into line 2 to -1 of htmlStr
>   --end repeat for each tag you want to preserve
>   -
>   --  Now line 2 to last start with "img" and have a null char as the end
> tag
>   --  You can strip all  then restore the html by
>   ---
> --Now restore all the protected tags
>   replace cr with "<" in htmlStr
>   replace null with ">" in htmlStr
>   replace "†" with cr in htmlStr
>   put htmlStr into fld 2
> end test
> 
>  -- although this last replace step may not make any sense since HTML pages
> don't use cr to format or delimit anything.
> 
> This is very fast and protects/restores the targeted tags.
> 
> Jim Ault
> Las Vegas
> 
> 
> ___
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Anyone got one of these?

2007-01-26 Thread Jim Ault
On 1/26/07 10:56 AM, "Chipp Walters" <[EMAIL PROTECTED]> wrote:

> function stripAllTagsBut pHtml,pTagsList
>   --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
>   --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..
> It's used to strip all tags from HTML but those in the pTagsList parameter.
> 
> IOW, it can be used to grab the HTML of a page, and strip everything but the
> img tags.
> 
Do you need them in the sequencial order?
assuming Yes

I would start with the idea of making the 'un-naughty bits' a list to be
used in an optional repeat loop below, then

---  start copy here
--Short, fast, sweet.
on test
  put fld 1 into htmlStr --assumes sorce is in fld 1
  replace Null with empty in htmlStr --clean out
  --numtochar(3) works just as well
  replace cr with "†" in htmlStr --preserve

  --repeat for each tag you want to preserve
  replace ""
  repeat for each line LNN in (line 2 to -1 of htmlStr)
put item 1 of LNN & null after newHtmlStr
put item 2 of LNN & cr after newHtmlStr
  end repeat
  put newHtmlstr into line 2 to -1 of htmlStr
  --end repeat for each tag you want to preserve
  -
  --  Now line 2 to last start with "img" and have a null char as the end
tag
  --  You can strip all  then restore the html by
  ---
--Now restore all the protected tags
  replace cr with "<" in htmlStr
  replace null with ">" in htmlStr
  replace "†" with cr in htmlStr
  put htmlStr into fld 2
end test

 -- although this last replace step may not make any sense since HTML pages
don't use cr to format or delimit anything.

This is very fast and protects/restores the targeted tags.

Jim Ault
Las Vegas


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Anyone got one of these?

2007-01-26 Thread Peter T. Evensen
I use a field to strip my HTML.  I just put it in the htmlText of a hidden 
field and then get the text of that filed and all HTML tags are 
conveniently gone...


That doesn't help Chipp, but is a short two-liner replacement for Ken's 
algorithm.


At 01:31 PM 1/26/2007, you wrote:

On Fri, 26 Jan 2007 12:56:36 -0600, Chipp Walters wrote:

> function stripAllTagsBut pHtml,pTagsList
>  --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
>  --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..
>
>
> It's used to strip all tags from HTML but those in the pTagsList parameter.
>
> IOW, it can be used to grab the HTML of a page, and strip everything 
but the

> img tags.
>
> I'm starting to write it, but thought I'd ask-- just in case.

Closest thing I have is:

function stsStripHTML what
  put replaceText(what,"(?si)","") into what
  put replaceText(what,"(?si)","") into what
  put replaceText(what,"<.*?>","") into what
  put replaceText(what,tab,"") into what
  put replaceText(what,CR & "{3,}","") into what
  return what
end stsStripHTML

But this strips all tags...

Ken Ray
Sons of Thunder Software, Inc.
Email: [EMAIL PROTECTED]
Web Site: http://www.sonsothunder.com/
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


Peter T. Evensen
http://www.PetersRoadToHealth.com
314-629-5248 or 888-682-4588 



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Anyone got one of these?

2007-01-26 Thread Chipp Walters

Thanks both Ken and Trevor...I'll take a look at these. :-)
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Anyone got one of these?

2007-01-26 Thread Trevor DeVore

On Jan 26, 2007, at 10:56 AM, Chipp Walters wrote:


function stripAllTagsBut pHtml,pTagsList
 --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
 --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..


It's used to strip all tags from HTML but those in the pTagsList  
parameter.


IOW, it can be used to grab the HTML of a page, and strip  
everything but the

img tags.

I'm starting to write it, but thought I'd ask-- just in case.


Well, I have one I've been working on that takes a list of things to  
strip.  You could modify it to fit your needs maybe.  The first  
version was much more compact and use matchText.  Then I stress  
tested it and it was slow and I had to call it quite often and  
with large amounts of text.  So I came up with the attached version.



--
Trevor DeVore
Blue Mango Learning Systems - www.bluemangolearning.com
[EMAIL PROTECTED]



/**
 * Cleanses a string of the specified Revolution HTML tags.
 *
 * @param  pHTML   HTML to act on.
 * @param  pStripFilterList of tags to strip:  
p,size,face,lang,color,bgcolor,b,i,u,strike,sub,sup,box,threedbox,expand 
ed,condensed,img,a.
 * @param  pStripTrailingCR pass true to strip any trailing CR from  
END of the pHTML.

 *
 * @return empty
 */
FUNCTION str_stripHTML pHTML, pStripFilter, pStripTrailingCR
local tProp,tFontFilter,tInlineFilter,tAttributeFilter,tStart,tEnd
local tSkip,tOffset1,tOffset2,tDeleteChars,i

set the wholematches to true

--> PROCESS pStripFilter
REPEAT for each item tProp in pStripFilter
IF tProp is among the items of  
"p,b,i,u,strike,sub,sup,box,threedbox,expanded,condensed" THEN

put tProp &comma after tAttributeFilter
ELSE IF tProp is among the items of  
"face,size,color,bgcolor,lang" THEN

put tProp &comma after tFontFilter
ELSE IF tProp is among the items of "img,a" THEN
put tProp & comma after tInlineFilter
END IF
END REPEAT

--> PROCESS
REPEAT forever --> OK, I TRIED USING MATCHCHUNK WITH THIS BUT IT  
WAS A GAZILLION TIMES SLOWER

put offset(" 0 THEN
put offset(">", pHTML, tSkip + tOffset1) into tOffset2 -- 
> GET CLOSING TAG


--> LOOP THROUGH PROPS AND ERASE
REPEAT for each item tProp in tFontFilter
put offset(space & tProp & "=" & quote, pHTML, tSkip  
+ tOffset1) into tStart
IF tStart > 0 AND tSkip + tOffset1 + tStart < tSkip  
+ tOffset1 + tOffset2 THEN --> ONLY LOOK FOR PROPS IN CURRENT FONT TAG

get tSkip + tOffset1 + tStart + length(tProp) + 2
put offset(quote, pHTML, it) into tEnd
IF tEnd > 0 THEN
put tSkip + tStart + tOffset1 & comma & it +  
tEnd & cr after tDeleteChars

END IF
END IF
END REPEAT

--> NOW MOVE BACKWARDS THROUGH LIST AND DELETE
REPEAT with i = the number of lines of tDeleteChars down  
to 1
delete char (item 1 of line i of tDeleteChars) to  
(item 2 of line i of tDeleteChars) of pHTML

END REPEAT
put empty into tDeleteChars
ELSE
exit REPEAT
END IF
add tOffset1 + 4 to tSkip
END REPEAT

REPEAT for each item tProp in tAttributeFilter
replace "<"&tProp&">" with empty in pHTML
replace "" with empty in pHTML
END REPEAT

REPEAT for each item tProp in tInlineFilter
REPEAT forever
put offset("<"&tProp, pHTML) into tStart
IF tStart > 0 THEN
put offset(">", pHTML, tStart) into tEnd
IF tEnd > 0 THEN
delete char tStart to (tStart+tEnd) of pHTML
ELSE
exit REPEAT
END IF
ELSE
exit REPEAT
END IF
END REPEAT
END REPEAT

IF "a" is among the items of tInlineFilter THEN
replace "" with empty in pHTML
END IF

--> REMOVE ANY LONELY  TAGS
REPEAT forever
put offset("", pHTML) into tStart
IF tStart > 0 THEN
put offset("", pHTML, tStart) into tEnd
IF tEnd > 0 THEN
delete char tStart+tEnd to tStart+tEnd+6 of pHTML
delete char tStart to tStart+5 of pHTML
ELSE
exit REPEAT
END IF
ELSE
exit REPEAT
END IF
END REPEAT

--> REMOVE TRAILING RETURNS
IF pStripTrailingCR THEN
REPEAT forever
IF char -8 to -1 of pHTML is cr&"" THEN
delete char -8 to -1 of pHTML
ELSE
exit REPEAT
END IF
END REPEAT
END IF

return pHTML
END str_stripHTML

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manag

Re: Anyone got one of these?

2007-01-26 Thread Ken Ray
On Fri, 26 Jan 2007 12:56:36 -0600, Chipp Walters wrote:

> function stripAllTagsBut pHtml,pTagsList
>  --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
>  --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..
> 
> 
> It's used to strip all tags from HTML but those in the pTagsList parameter.
> 
> IOW, it can be used to grab the HTML of a page, and strip everything but the
> img tags.
> 
> I'm starting to write it, but thought I'd ask-- just in case.

Closest thing I have is:

function stsStripHTML what
  put replaceText(what,"(?si)","") into what
  put replaceText(what,"(?si)","") into what
  put replaceText(what,"<.*?>","") into what
  put replaceText(what,tab,"") into what
  put replaceText(what,CR & "{3,}","") into what
  return what
end stsStripHTML

But this strips all tags...

Ken Ray
Sons of Thunder Software, Inc.
Email: [EMAIL PROTECTED]
Web Site: http://www.sonsothunder.com/
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Anyone got one of these?

2007-01-26 Thread Chipp Walters

function stripAllTagsBut pHtml,pTagsList
 --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
 --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..


It's used to strip all tags from HTML but those in the pTagsList parameter.

IOW, it can be used to grab the HTML of a page, and strip everything but the
img tags.

I'm starting to write it, but thought I'd ask-- just in case.

-Chipp
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution