Re: HTML Tables

2006-04-19 Thread Jim Ault
Here is one that used some RegEx that I sent to Eric Chatonet and he ramped
up to a real code function for removing tags.  Note that a couple steps are
commented out since the exact page you are trying to parse may or may not
need different treatment for spaces and returns.

Try this way, then tweak the "--replace" lines to see if that gives a better
result.
 start copy here
function StripTags pHtml
  local tRegex,tPrevText
  constant kHtml = 
"é,à,ç,>,<,ecirc;,è,©,•,',&m
iddot;,&"
  constant kConvertedHtml = "é,à,ç,>,<,ê,è,©"
  put kConvertedHtml into tempp
  put "," & numtochar(165) & "," & numtochar(39) & "," & numtochar(225) &
"," & numtochar(38) after tempp
  -
  --replace return with space in pHtml
  --replace return with "" in pHtml
  
  replace numtochar(13) with empty in pHtml
  replace tab with empty in pHtml
  -
  put replacetext(pHtml,"(?Usi)","") into pHtml
  put replacetext(pHtml,"(?Usi).*","") into pHtml
  put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
  -
  replace " " with space in pHtml
  replace "" with return in pHtml
  replace "" with return in pHtml
  -
  put  "<[^><]*>" into tRegex
  put replacetext(pHtml,tRegex,"") into pHtml
  put replacetext(pHtml,tRegex,"") into pHtml
  -
  repeat until tPrevText is pHtml
if keepRunning is "false" then exit StripTags
put pHtml into tPrevText
put replacetext(pHtml," +",space) into pHtml
put replacetext(pHtml,"^ ","") into pHtml
  end repeat
  -
  replace (space & return) with return in pHtml
  replace (return & space) with return in pHtml
  filter pHtml without empty
  -
  replace """ with quote in pHtml
  repeat with i = 1 to the number of items of kHtml
replace item i of kHtml with item i of kConvertedHtml in pHtml
  end repeat
  -
  return pHtml
end StripTags
--- end copy

Jim Ault
Las Vegas

On 4/19/06 12:29 AM, "Bill Marriott" <[EMAIL PROTECTED]> wrote:

> Forgive me if this has been "asked and answered" on the list before, but I
> think it's of general enough interest for me to post, in case someone has
> already invented this mousetrap.
> 
> I am wondering what the most efficient way might be to convert an HTML table
> into a Rev table.
> 
> The ideal solution would
> 
> - convert 's into rows and 's into columns
> - correctly handle (i.e., ignore) all of the various attributes that might
> be embedded within the table tags.
> - designed for data tables (not formatting tables)
> - work very fast
> 
> Ideas? Suggestions? Pointers?
> 
> - Bill 
> 
> 
> 
> ___
> use-revolution mailing list
> use-revolution@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: HTML Tables

2006-04-19 Thread Bill Marriott
Hey that's pretty darned good for off the top of your head! I make two 
changes. One is an adjustment to the regular expression to account for 
spaces before the < and after the > (so there is no junk around the tabs), 
and one is to fix a syntax error. (I hope I did it right, I'm not super 
familiar with regex's.)

Brian Yennie wrote

> ## get source from a field
> put fld "htmlSource" into tHTML
>
> ## translate end of row to end of line
> replace "" with return in tHTML
>
> ## translate end of column to tab
> replace "" with tab in tHTML
>
> ## delete all the rest of the html tags
put replaceText(tHTML, "\ *<[a-zA-Z/]+[^>]*>\ *", empty) into tHTML
>
> ## remove trailing tabs and trailing return
replace tab&return with return in tHTML
> delete last char of tHTML
>
> ## put it in a new field
> put tHTML into fld "tableField"



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: HTML Tables

2006-04-19 Thread Brian Yennie

Bill,

If you *just* need row and column data and no formatting, how about 
something like:

(warning: untested email code)

## get source from a field
put fld "htmlSource" into tHTML

## translate end of row to end of line
replace "" with return in tHTML

## translate end of column to tab
replace "" with tab in tHTML

## delete all the rest of the html tags
put replaceText(tHTML, "<[a-zA-Z/]+[^>]*>", empty) into tHTML

## remove trailing tabs and trailing return
replace tab&return with return
delete last char of tHTML

## put it in a new field
put tHTML into fld "tableField"


Forgive me if this has been "asked and answered" on the list before, 
but I
think it's of general enough interest for me to post, in case someone 
has

already invented this mousetrap.

I am wondering what the most efficient way might be to convert an HTML 
table

into a Rev table.

The ideal solution would

- convert 's into rows and 's into columns
- correctly handle (i.e., ignore) all of the various attributes that 
might

be embedded within the table tags.
- designed for data tables (not formatting tables)
- work very fast

Ideas? Suggestions? Pointers?

- Bill



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution




___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


HTML Tables

2006-04-19 Thread Bill Marriott
Forgive me if this has been "asked and answered" on the list before, but I 
think it's of general enough interest for me to post, in case someone has 
already invented this mousetrap.

I am wondering what the most efficient way might be to convert an HTML table 
into a Rev table.

The ideal solution would

- convert 's into rows and 's into columns
- correctly handle (i.e., ignore) all of the various attributes that might 
be embedded within the table tags.
- designed for data tables (not formatting tables)
- work very fast

Ideas? Suggestions? Pointers?

- Bill 



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Dealing with HTML Tables

2004-10-08 Thread Devin Asay
Okay, I'm just brainstorming here about how to parse text in HTML 
tables to make it neat and readable in a Rev field or fields. A couple 
of approaches occur to me:

- Download the html file, and basically use tabs and returns to 
separate the text in a coherent way. This is easy but could turn out 
messy.

- Parse out the table cell contents and put it into the cells of a 
table field. Might work, but seems like it wouldn't handle non-tabular 
text well.

- Dynamically create text fields and table fields as needed, based on 
how the html file parses out, use the characteristics of the text 
(textSize, formattedHeight, formattedWidth, etc.) to set the fields to 
be large enough to display the all of the text or table without 
scrolling. Then stack the fields top to bottom and group them in a 
group with a vertical scrollbar and scroll the group instead of the 
field. Might work but might be complicated to program. Maybe 
performance hits here?

Has anyone tried to do something like this? I noticed Xavier in 
DiscreteBrowser uses the first approach. Any ideas? The idea is not to 
exactly duplicate html tables, but to render the text in tables in a 
pleasing and readable way.

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution