Re: Anyone got one of these?
On Jan 26, 2007, at 12:13 PM, Trevor DeVore wrote: Well, I have one I've been working on that takes a list of things to strip. You could modify it to fit your needs maybe. The first version was much more compact and use matchText. Then I stress tested it and it was slow and I had to call it quite often and with large amounts of text. So I came up with the attached version. Chipp - One thing I should point out about my version. One of the requirements was being able to strip out attributes of the font tag such as color, size or lang but maintain the other attributes. It then strips out any font tags that just have (because all attributes were stripped). The code might be overkill for what you are trying to do. -- Trevor DeVore Blue Mango Learning Systems - www.bluemangolearning.com [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Anyone got one of these? [correction-typo]
Obviously > put item 2 of LNN & cr after newHtmlStr should be put item 2 to -1 of LNN & cr after newHtmlStr Jim On 1/26/07 1:20 PM, "Jim Ault" <[EMAIL PROTECTED]> wrote: > On 1/26/07 10:56 AM, "Chipp Walters" <[EMAIL PROTECTED]> wrote: > >> function stripAllTagsBut pHtml,pTagsList >> --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING >> --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. >> It's used to strip all tags from HTML but those in the pTagsList parameter. >> >> IOW, it can be used to grab the HTML of a page, and strip everything but the >> img tags. >> > Do you need them in the sequencial order? > assuming Yes > > I would start with the idea of making the 'un-naughty bits' a list to be > used in an optional repeat loop below, then > > --- start copy here > --Short, fast, sweet. > on test > put fld 1 into htmlStr --assumes sorce is in fld 1 > replace Null with empty in htmlStr --clean out > --numtochar(3) works just as well > replace cr with "†" in htmlStr --preserve > > --repeat for each tag you want to preserve > replace " > set the itemDel to ">" > repeat for each line LNN in (line 2 to -1 of htmlStr) > put item 1 of LNN & null after newHtmlStr > put item 2 of LNN & cr after newHtmlStr > end repeat > put newHtmlstr into line 2 to -1 of htmlStr > --end repeat for each tag you want to preserve > - > -- Now line 2 to last start with "img" and have a null char as the end > tag > -- You can strip all then restore the html by > --- > --Now restore all the protected tags > replace cr with "<" in htmlStr > replace null with ">" in htmlStr > replace "†" with cr in htmlStr > put htmlStr into fld 2 > end test > > -- although this last replace step may not make any sense since HTML pages > don't use cr to format or delimit anything. > > This is very fast and protects/restores the targeted tags. > > Jim Ault > Las Vegas > > > ___ > use-revolution mailing list > use-revolution@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Anyone got one of these?
On 1/26/07 10:56 AM, "Chipp Walters" <[EMAIL PROTECTED]> wrote: > function stripAllTagsBut pHtml,pTagsList > --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING > --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. > It's used to strip all tags from HTML but those in the pTagsList parameter. > > IOW, it can be used to grab the HTML of a page, and strip everything but the > img tags. > Do you need them in the sequencial order? assuming Yes I would start with the idea of making the 'un-naughty bits' a list to be used in an optional repeat loop below, then --- start copy here --Short, fast, sweet. on test put fld 1 into htmlStr --assumes sorce is in fld 1 replace Null with empty in htmlStr --clean out --numtochar(3) works just as well replace cr with "†" in htmlStr --preserve --repeat for each tag you want to preserve replace "" repeat for each line LNN in (line 2 to -1 of htmlStr) put item 1 of LNN & null after newHtmlStr put item 2 of LNN & cr after newHtmlStr end repeat put newHtmlstr into line 2 to -1 of htmlStr --end repeat for each tag you want to preserve - -- Now line 2 to last start with "img" and have a null char as the end tag -- You can strip all then restore the html by --- --Now restore all the protected tags replace cr with "<" in htmlStr replace null with ">" in htmlStr replace "†" with cr in htmlStr put htmlStr into fld 2 end test -- although this last replace step may not make any sense since HTML pages don't use cr to format or delimit anything. This is very fast and protects/restores the targeted tags. Jim Ault Las Vegas ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Anyone got one of these?
I use a field to strip my HTML. I just put it in the htmlText of a hidden field and then get the text of that filed and all HTML tags are conveniently gone... That doesn't help Chipp, but is a short two-liner replacement for Ken's algorithm. At 01:31 PM 1/26/2007, you wrote: On Fri, 26 Jan 2007 12:56:36 -0600, Chipp Walters wrote: > function stripAllTagsBut pHtml,pTagsList > --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING > --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. > > > It's used to strip all tags from HTML but those in the pTagsList parameter. > > IOW, it can be used to grab the HTML of a page, and strip everything but the > img tags. > > I'm starting to write it, but thought I'd ask-- just in case. Closest thing I have is: function stsStripHTML what put replaceText(what,"(?si)","") into what put replaceText(what,"(?si)","") into what put replaceText(what,"<.*?>","") into what put replaceText(what,tab,"") into what put replaceText(what,CR & "{3,}","") into what return what end stsStripHTML But this strips all tags... Ken Ray Sons of Thunder Software, Inc. Email: [EMAIL PROTECTED] Web Site: http://www.sonsothunder.com/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution Peter T. Evensen http://www.PetersRoadToHealth.com 314-629-5248 or 888-682-4588 ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Anyone got one of these?
Thanks both Ken and Trevor...I'll take a look at these. :-) ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Anyone got one of these?
On Jan 26, 2007, at 10:56 AM, Chipp Walters wrote: function stripAllTagsBut pHtml,pTagsList --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. It's used to strip all tags from HTML but those in the pTagsList parameter. IOW, it can be used to grab the HTML of a page, and strip everything but the img tags. I'm starting to write it, but thought I'd ask-- just in case. Well, I have one I've been working on that takes a list of things to strip. You could modify it to fit your needs maybe. The first version was much more compact and use matchText. Then I stress tested it and it was slow and I had to call it quite often and with large amounts of text. So I came up with the attached version. -- Trevor DeVore Blue Mango Learning Systems - www.bluemangolearning.com [EMAIL PROTECTED] /** * Cleanses a string of the specified Revolution HTML tags. * * @param pHTML HTML to act on. * @param pStripFilterList of tags to strip: p,size,face,lang,color,bgcolor,b,i,u,strike,sub,sup,box,threedbox,expand ed,condensed,img,a. * @param pStripTrailingCR pass true to strip any trailing CR from END of the pHTML. * * @return empty */ FUNCTION str_stripHTML pHTML, pStripFilter, pStripTrailingCR local tProp,tFontFilter,tInlineFilter,tAttributeFilter,tStart,tEnd local tSkip,tOffset1,tOffset2,tDeleteChars,i set the wholematches to true --> PROCESS pStripFilter REPEAT for each item tProp in pStripFilter IF tProp is among the items of "p,b,i,u,strike,sub,sup,box,threedbox,expanded,condensed" THEN put tProp &comma after tAttributeFilter ELSE IF tProp is among the items of "face,size,color,bgcolor,lang" THEN put tProp &comma after tFontFilter ELSE IF tProp is among the items of "img,a" THEN put tProp & comma after tInlineFilter END IF END REPEAT --> PROCESS REPEAT forever --> OK, I TRIED USING MATCHCHUNK WITH THIS BUT IT WAS A GAZILLION TIMES SLOWER put offset(" 0 THEN put offset(">", pHTML, tSkip + tOffset1) into tOffset2 -- > GET CLOSING TAG --> LOOP THROUGH PROPS AND ERASE REPEAT for each item tProp in tFontFilter put offset(space & tProp & "=" & quote, pHTML, tSkip + tOffset1) into tStart IF tStart > 0 AND tSkip + tOffset1 + tStart < tSkip + tOffset1 + tOffset2 THEN --> ONLY LOOK FOR PROPS IN CURRENT FONT TAG get tSkip + tOffset1 + tStart + length(tProp) + 2 put offset(quote, pHTML, it) into tEnd IF tEnd > 0 THEN put tSkip + tStart + tOffset1 & comma & it + tEnd & cr after tDeleteChars END IF END IF END REPEAT --> NOW MOVE BACKWARDS THROUGH LIST AND DELETE REPEAT with i = the number of lines of tDeleteChars down to 1 delete char (item 1 of line i of tDeleteChars) to (item 2 of line i of tDeleteChars) of pHTML END REPEAT put empty into tDeleteChars ELSE exit REPEAT END IF add tOffset1 + 4 to tSkip END REPEAT REPEAT for each item tProp in tAttributeFilter replace "<"&tProp&">" with empty in pHTML replace "" with empty in pHTML END REPEAT REPEAT for each item tProp in tInlineFilter REPEAT forever put offset("<"&tProp, pHTML) into tStart IF tStart > 0 THEN put offset(">", pHTML, tStart) into tEnd IF tEnd > 0 THEN delete char tStart to (tStart+tEnd) of pHTML ELSE exit REPEAT END IF ELSE exit REPEAT END IF END REPEAT END REPEAT IF "a" is among the items of tInlineFilter THEN replace "" with empty in pHTML END IF --> REMOVE ANY LONELY TAGS REPEAT forever put offset("", pHTML) into tStart IF tStart > 0 THEN put offset("", pHTML, tStart) into tEnd IF tEnd > 0 THEN delete char tStart+tEnd to tStart+tEnd+6 of pHTML delete char tStart to tStart+5 of pHTML ELSE exit REPEAT END IF ELSE exit REPEAT END IF END REPEAT --> REMOVE TRAILING RETURNS IF pStripTrailingCR THEN REPEAT forever IF char -8 to -1 of pHTML is cr&"" THEN delete char -8 to -1 of pHTML ELSE exit REPEAT END IF END REPEAT END IF return pHTML END str_stripHTML ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manag
Re: Anyone got one of these?
On Fri, 26 Jan 2007 12:56:36 -0600, Chipp Walters wrote: > function stripAllTagsBut pHtml,pTagsList > --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING > --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. > > > It's used to strip all tags from HTML but those in the pTagsList parameter. > > IOW, it can be used to grab the HTML of a page, and strip everything but the > img tags. > > I'm starting to write it, but thought I'd ask-- just in case. Closest thing I have is: function stsStripHTML what put replaceText(what,"(?si)","") into what put replaceText(what,"(?si)","") into what put replaceText(what,"<.*?>","") into what put replaceText(what,tab,"") into what put replaceText(what,CR & "{3,}","") into what return what end stsStripHTML But this strips all tags... Ken Ray Sons of Thunder Software, Inc. Email: [EMAIL PROTECTED] Web Site: http://www.sonsothunder.com/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Anyone got one of these?
function stripAllTagsBut pHtml,pTagsList --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc.. It's used to strip all tags from HTML but those in the pTagsList parameter. IOW, it can be used to grab the HTML of a page, and strip everything but the img tags. I'm starting to write it, but thought I'd ask-- just in case. -Chipp ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution