Re: HTML Tables
Here is one that used some RegEx that I sent to Eric Chatonet and he ramped up to a real code function for removing tags. Note that a couple steps are commented out since the exact page you are trying to parse may or may not need different treatment for spaces and returns. Try this way, then tweak the "--replace" lines to see if that gives a better result. start copy here function StripTags pHtml local tRegex,tPrevText constant kHtml = "é,à,ç,>,<,ecirc;,è,©,•,',&m iddot;,&" constant kConvertedHtml = "é,à,ç,>,<,ê,è,©" put kConvertedHtml into tempp put "," & numtochar(165) & "," & numtochar(39) & "," & numtochar(225) & "," & numtochar(38) after tempp - --replace return with space in pHtml --replace return with "" in pHtml replace numtochar(13) with empty in pHtml replace tab with empty in pHtml - put replacetext(pHtml,"(?Usi)","") into pHtml put replacetext(pHtml,"(?Usi).*","") into pHtml put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml - replace " " with space in pHtml replace "" with return in pHtml replace "" with return in pHtml - put "<[^><]*>" into tRegex put replacetext(pHtml,tRegex,"") into pHtml put replacetext(pHtml,tRegex,"") into pHtml - repeat until tPrevText is pHtml if keepRunning is "false" then exit StripTags put pHtml into tPrevText put replacetext(pHtml," +",space) into pHtml put replacetext(pHtml,"^ ","") into pHtml end repeat - replace (space & return) with return in pHtml replace (return & space) with return in pHtml filter pHtml without empty - replace """ with quote in pHtml repeat with i = 1 to the number of items of kHtml replace item i of kHtml with item i of kConvertedHtml in pHtml end repeat - return pHtml end StripTags --- end copy Jim Ault Las Vegas On 4/19/06 12:29 AM, "Bill Marriott" <[EMAIL PROTECTED]> wrote: > Forgive me if this has been "asked and answered" on the list before, but I > think it's of general enough interest for me to post, in case someone has > already invented this mousetrap. > > I am wondering what the most efficient way might be to convert an HTML table > into a Rev table. > > The ideal solution would > > - convert 's into rows and 's into columns > - correctly handle (i.e., ignore) all of the various attributes that might > be embedded within the table tags. > - designed for data tables (not formatting tables) > - work very fast > > Ideas? Suggestions? Pointers? > > - Bill > > > > ___ > use-revolution mailing list > use-revolution@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: HTML Tables
Hey that's pretty darned good for off the top of your head! I make two changes. One is an adjustment to the regular expression to account for spaces before the < and after the > (so there is no junk around the tabs), and one is to fix a syntax error. (I hope I did it right, I'm not super familiar with regex's.) Brian Yennie wrote > ## get source from a field > put fld "htmlSource" into tHTML > > ## translate end of row to end of line > replace "" with return in tHTML > > ## translate end of column to tab > replace "" with tab in tHTML > > ## delete all the rest of the html tags put replaceText(tHTML, "\ *<[a-zA-Z/]+[^>]*>\ *", empty) into tHTML > > ## remove trailing tabs and trailing return replace tab&return with return in tHTML > delete last char of tHTML > > ## put it in a new field > put tHTML into fld "tableField" ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: HTML Tables
Bill, If you *just* need row and column data and no formatting, how about something like: (warning: untested email code) ## get source from a field put fld "htmlSource" into tHTML ## translate end of row to end of line replace "" with return in tHTML ## translate end of column to tab replace "" with tab in tHTML ## delete all the rest of the html tags put replaceText(tHTML, "<[a-zA-Z/]+[^>]*>", empty) into tHTML ## remove trailing tabs and trailing return replace tab&return with return delete last char of tHTML ## put it in a new field put tHTML into fld "tableField" Forgive me if this has been "asked and answered" on the list before, but I think it's of general enough interest for me to post, in case someone has already invented this mousetrap. I am wondering what the most efficient way might be to convert an HTML table into a Rev table. The ideal solution would - convert 's into rows and 's into columns - correctly handle (i.e., ignore) all of the various attributes that might be embedded within the table tags. - designed for data tables (not formatting tables) - work very fast Ideas? Suggestions? Pointers? - Bill ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
HTML Tables
Forgive me if this has been "asked and answered" on the list before, but I think it's of general enough interest for me to post, in case someone has already invented this mousetrap. I am wondering what the most efficient way might be to convert an HTML table into a Rev table. The ideal solution would - convert 's into rows and 's into columns - correctly handle (i.e., ignore) all of the various attributes that might be embedded within the table tags. - designed for data tables (not formatting tables) - work very fast Ideas? Suggestions? Pointers? - Bill ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Dealing with HTML Tables
Okay, I'm just brainstorming here about how to parse text in HTML tables to make it neat and readable in a Rev field or fields. A couple of approaches occur to me: - Download the html file, and basically use tabs and returns to separate the text in a coherent way. This is easy but could turn out messy. - Parse out the table cell contents and put it into the cells of a table field. Might work, but seems like it wouldn't handle non-tabular text well. - Dynamically create text fields and table fields as needed, based on how the html file parses out, use the characteristics of the text (textSize, formattedHeight, formattedWidth, etc.) to set the fields to be large enough to display the all of the text or table without scrolling. Then stack the fields top to bottom and group them in a group with a vertical scrollbar and scroll the group instead of the field. Might work but might be complicated to program. Maybe performance hits here? Has anyone tried to do something like this? I noticed Xavier in DiscreteBrowser uses the first approach. Any ideas? The idea is not to exactly duplicate html tables, but to render the text in tables in a pleasing and readable way. Devin Asay Humanities Technology and Research Support Center Brigham Young University ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution