subject:"Re\: reading and converting web page HTML text"

Re: reading and converting web page HTML text

2010-03-08 Thread Richard Gaskin


Peter Brigham MD wrote:


On Mar 7, 2010, at 10:51 PM, Ken Ray wrote:




Apparently numbers less than 8 are interpreted as HTML relative size
and larger numbers specify point size.

Could this have something to do with the recently mentioned problems
with font sizes on Unix platforms? If somehow the rev unix engine is
mixing these up, then something intended to be size 14 could display
at size 4.  But I know very little about this stuff, it's just a
thought.


Actually it's the way web browsers handled it a long time ago when
"real"
font sizes were introduced; they had to remain backwards compatible
with the
previous method but also adopt the new one.


OK. But could this be what isn't working right in Rev on the unix
platforms, or is that unlikely?


The main issue as far as Rev goes seems to be that Rev is doing what 
Firefox still does, calculating point sizes with an assumption of 72-dpi 
resolution, while modern OSes are largely resolution-independent.


But check out the links in these posts and you'll find that only tells 
part of the story:





Like so many in the Linux world have noted, Gnome simply renders too 
big.  From the bug reports in the Gnome and Ubuntu bug databases it 
appears that there's an extra level of translation happening in Gnome, 
but before they jump on a fix they need to figure out what to do for 
backward compatibility.  Sticky issue; glad it's theirs and not mine. :)


--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-08 Thread Peter Brigham MD


On Mar 7, 2010, at 10:51 PM, Ken Ray wrote:




Apparently numbers less than 8 are interpreted as HTML relative size
and larger numbers specify point size.

Could this have something to do with the recently mentioned problems
with font sizes on Unix platforms? If somehow the rev unix engine is
mixing these up, then something intended to be size 14 could display
at size 4.  But I know very little about this stuff, it's just a
thought.


Actually it's the way web browsers handled it a long time ago when  
"real"
font sizes were introduced; they had to remain backwards compatible  
with the

previous method but also adopt the new one.


OK. But could this be what isn't working right in Rev on the unix  
platforms, or is that unlikely?


-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-07 Thread Ken Ray


> Apparently numbers less than 8 are interpreted as HTML relative size
> and larger numbers specify point size.
> 
> Could this have something to do with the recently mentioned problems
> with font sizes on Unix platforms? If somehow the rev unix engine is
> mixing these up, then something intended to be size 14 could display
> at size 4.  But I know very little about this stuff, it's just a
> thought.

Actually it's the way web browsers handled it a long time ago when "real"
font sizes were introduced; they had to remain backwards compatible with the
previous method but also adopt the new one.

Ken Ray
Sons of Thunder Software, Inc.
Email: k...@sonsothunder.com
Web Site: http://www.sonsothunder.com/


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-07 Thread Peter Brigham MD


On Mar 6, 2010, at 7:13 PM, Jim Ault wrote:


On Mar 6, 2010, at 2:35 PM, Mark Stuart wrote:



Hi François,

Thanx for your quick reply.
I added Sarah's script into my application and ran it.

The function halted with an error on ("), because it is not a  
number. I
think Sarah's function is looking for a number after the ampersand,  
correct?
So I'm handling the (") as an exception for now by using this  
script:


if theText contains """ then
 replace """ with quote in theText
end if

and then call the decodeEntities(theText) function.

I'm sure I'll come across other HTML text like this, but don't know  
how to

handle it really.



Basically, I would go to a site that shows all html entities, make a  
list of those, and do a replace using a repeat loop.


Google 'html entities' to get the possibilities.

Jim Ault
Las Vegas


I was curious about this so I looked in the dictionary entry for  
"HTMLtext", in which there is a list of named HTML entities that Rev  
is supposed to recognize. In my version of the dictionary this list is  
missing the ampersand before almost all the entries and also mostly  
doesn't show the characters referred to (I'm submitting a user note on  
this). I cleaned up this list meanwhile and it is available at:

http://home.comcast.net/%7Epmbrig/HTMLcharEncoding.dmg (Mac)
http://home.comcast.net/%7Epmbrig/HTMLcharEncoding.rev.zip (Windows)

The data is stored also in stack custom properties -- to find the HTML  
encoding for a character c, use:

the chartoHTML[c] of this stack
but you have to set the casesensitive to true first before calling the  
function or it won't recognize the difference between "É" and  
"é"


On another note, in perusing the dictionary entry for HTMLtext, I  
noticed that it says:



  
Encloses text whose textFont, textSize, foregroundColor, or  
backgroundColor is different from the field's default. These five  
properties are represented as attributes of the  tag.
	* face="fontName" appears in the  tag if the textFont is not  
the default.

* size="pointSize" appears if the textSize is not the default.
In standard HTML, the size attribute normally takes a value between 1  
and 7, representing a relative text size, with 3 being the normal text  
size for the web page. To accommodate this convention, when setting  
the HTMLText of a field, if the pointSize is between 1 and 7, the  
textSize of the text is set to a standard value:

pointSize   textSize
1   8 point
2   10 point
3   12 point
4   14 point
5   17 point
6   20 point
7   25 point


and further down it says:


* The size attribute of the font tag can encode the font's point size,  
in addition to the standard 7 HTML sizes.



Apparently numbers less than 8 are interpreted as HTML relative size  
and larger numbers specify point size.


Could this have something to do with the recently mentioned problems  
with font sizes on Unix platforms? If somehow the rev unix engine is  
mixing these up, then something intended to be size 14 could display  
at size 4.  But I know very little about this stuff, it's just a  
thought.


-- Peter

Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread J. Landman Gay


Mark Stuart wrote:

Hi all,

Richard - the html entity that didn't "convert" was the quot, starting with
& and ending with semi-colon ;


The htmltext can only be applied to fields; it won't work in variables. 
So you need to do what Richard suggested -- he's using the templateField 
(which is a sort of artificial temporary construct) but you can use a 
real field directly. For example, assuming a variable that contains your 
html:


  set the htmlText of fld 1 to myWebText

For the most part I've had very good luck with htmlText, and it is much 
easier to let the engine do the work.


--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Richard Gaskin


Mark Stuart wrote:


Richard - the html entity that didn't "convert" was the quot, starting with
& and ending with semi-colon ;
(if I typed that into the email, you would only see ", as you may see in the
following).

Jim - so you are suggesting a function to convert all possible entities for
a text chunk:

function convertHTMLEntities theText
 replace """ with quote in theText
 replace "whatever" with "Ç" in theText
 ...
 ...
 return theText
end convertHTMLEntities



Actually I tried this with the text of your original email, using two 
fields and a button with this script:



on mouseUp
  put htmlTextToText(fld 1) into fld 2
end mouseUp

function htmlToText pHtml
  set the htmlText of the templateField to pHtml
  return the text of the templateField
end htmlToText


In field 1 I had:

---
I'm reading the HTML text of a web page and parsing it. Some of the text
that I'm parsing contains (") - braces not included.

What runrev function do I use to convert that HTML text to the double quote
(") character?
There will be other characters that I also need to convert, such as
(Björnke).
After reading and parsing the text, I'll be loading a DataGrid.
--


After running it through the function I get:

--
I'm reading the HTML text of a web page and parsing it. Some of the text 
that I'm parsing contains (") - braces not included.  What runrev 
function do I use to convert that HTML text to the double quote (") 
character? There will be other characters that I also need to convert, 
such as (Björnke). After reading and parsing the text, I'll be loading a 
DataGrid.

--


The htmlText property is designed not to be true HTML, but to be the one 
way you can represent the contents of fields using ASCII characters with 
complete fidelity.  HTML conventions were adopted for this because of 
their simple, extensible nature, so while the name "htmlText" often 
conjures up all sorts of web expectations it wasn't designed to fulfill, 
when it comes to providing an SGML-like representation of anything you 
can do with Rev fields it generally works like a champ.


As Jim noted, there are some things you can do in HTML that aren't 
supported by Rev fields currently, so those will fail when attempting to 
use htmlText as a generic HTML-to-text converter.  But you'd be 
surprised at what you can do with it, often including many Unicode 
entities as well now that Rev supports Unicode.


Try out the htmlTextToText function above and let me know where it 
doesn't work for you for anything you can display in a Rev field.


--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Mark Stuart


Hi all,

Richard - the html entity that didn't "convert" was the quot, starting with
& and ending with semi-colon ;
(if I typed that into the email, you would only see ", as you may see in the
following).

Jim - so you are suggesting a function to convert all possible entities for
a text chunk:

function convertHTMLEntities theText
 replace """ with quote in theText
 replace "whatever" with "Ç" in theText
 ...
 ...
 return theText
end convertHTMLEntities


Regards,
Mark Stuart
-- 
View this message in context: 
http://n4.nabble.com/reading-and-converting-web-page-HTML-text-tp1583130p1583305.html
Sent from the Revolution - User mailing list archive at Nabble.com.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Jim Ault


On Mar 6, 2010, at 7:07 PM, J. Landman Gay wrote:

Mark Stuart wrote:

Hi all,
Looking up runrev Dictionary for "HTML", does result in a find  
(HTMLtext),

which mentions the following near the very end.
Special characters (whose ASCII value is greater than 127) are  
encoded as

HTML entities. Revolution recognizes the following named entities:
<<
and then it lists all the entities it currently supports.
I'd say that using the HTMLtext function would handle all these html
entities. But it appears to not do that.
Therefore, should I submit a bug report on this?


Theh """ entity works. I'm not sure why it isn't in the list.



The list is only for high ASCII.  Quote is ascii 34, unless you are  
thinking curly quotes.
My belief is that if you are doing work for clients, you should use  
the actual html entities yourself, rather than trust that Rev will  
always do this properly.  The reason is that Rev is not a robust html  
tool nor is it designed to be.  This is especially true if you need to  
work with Unicode and entities (a very complex adventure)


Hope this helps


Jim Ault
Las Vegas



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Richard Gaskin


Mark Stuart wrote:

Looking up runrev Dictionary for "HTML", does result in a find (HTMLtext),
which mentions the following near the very end.



Special characters (whose ASCII value is greater than 127) are encoded as
HTML entities. Revolution recognizes the following named entities:
<<
and then it lists all the entities it currently supports.

I'd say that using the HTMLtext function would handle all these html
entities. But it appears to not do that.
Therefore, should I submit a bug report on this?


Which ones have you found that don't work?

The one from your original post (Björnke) does.

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread J. Landman Gay


Mark Stuart wrote:

Hi all,
Looking up runrev Dictionary for "HTML", does result in a find (HTMLtext),
which mentions the following near the very end.
Special characters (whose ASCII value is greater than 127) are encoded as
HTML entities. Revolution recognizes the following named entities:
<<
and then it lists all the entities it currently supports.

I'd say that using the HTMLtext function would handle all these html
entities. But it appears to not do that.
Therefore, should I submit a bug report on this?


Theh """ entity works. I'm not sure why it isn't in the list.

--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Mark Stuart


Hi all,
Looking up runrev Dictionary for "HTML", does result in a find (HTMLtext),
which mentions the following near the very end.
>>
Special characters (whose ASCII value is greater than 127) are encoded as
HTML entities. Revolution recognizes the following named entities:
<<
and then it lists all the entities it currently supports.

I'd say that using the HTMLtext function would handle all these html
entities. But it appears to not do that.
Therefore, should I submit a bug report on this?

Regards,
Mark Stuart
-- 
View this message in context: 
http://n4.nabble.com/reading-and-converting-web-page-HTML-text-tp1583130p1583224.html
Sent from the Revolution - User mailing list archive at Nabble.com.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Jim Ault


On Mar 6, 2010, at 2:35 PM, Mark Stuart wrote:



Hi François,

Thanx for your quick reply.
I added Sarah's script into my application and ran it.

The function halted with an error on ("), because it is not a  
number. I
think Sarah's function is looking for a number after the ampersand,  
correct?
So I'm handling the (") as an exception for now by using this  
script:


if theText contains """ then
  replace """ with quote in theText
end if

and then call the decodeEntities(theText) function.

I'm sure I'll come across other HTML text like this, but don't know  
how to

handle it really.



Basically, I would go to a site that shows all html entities, make a  
list of those, and do a replace using a repeat loop.


Google 'html entities' to get the possibilities.

Jim Ault
Las Vegas



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread Mark Stuart


Hi François,

Thanx for your quick reply.
I added Sarah's script into my application and ran it.

The function halted with an error on ("), because it is not a number. I
think Sarah's function is looking for a number after the ampersand, correct?
So I'm handling the (") as an exception for now by using this script:

if theText contains """ then
   replace """ with quote in theText
end if

and then call the decodeEntities(theText) function.

I'm sure I'll come across other HTML text like this, but don't know how to
handle it really.

Merci,
Mark Stuart
-- 
View this message in context: 
http://n4.nabble.com/reading-and-converting-web-page-HTML-text-tp1583130p1583155.html
Sent from the Revolution - User mailing list archive at Nabble.com.
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

2010-03-06 Thread François Chaplais


Le 6 mars 2010 à 23:01, Mark Stuart a écrit :

> Hi all,
> I'm reading the HTML text of a web page and parsing it. Some of the text
> that I'm parsing contains (") - braces not included.
> 
> What runrev function do I use to convert that HTML text to the double quote
> (") character?
> There will be other characters that I also need to convert, such as
> (Björnke).
> After reading and parsing the text, I'll be loading a DataGrid.
> 
> I've tried some functions, but with no success.
> 
> Regards,
> Mark Stuart
> 
digging in my mail archive I found this post from Sarah (it puts unicode text 
into a field from an HTML source, if I am correct)
HTH


On Sun, Jul 26, 2009 at 7:18 AM, Sivakatirswami wrote:
> Is there a way to get htmlEntities
> 
> "“Kanwar”
> 
> The rest of their lifestyle — names, marriage rituals, dressing styles
> — continued to be the same"
> 
> to appear correctly in a field where such enties are  part of the html used
> to set the htmltext of a field?


I had to wrestle with this recently and after numerous attempts with
uniencode, unidecode, macToISO etc., I ended up writing my own
function to do it:

function decodeEntities pText
  if pText contains "&#" is false then return pText

  set the useunicode to true
  put empty into tNew
  repeat until pText is empty
 put char 1 of pText into c
 if c <> "&" then
put c after tNew
delete char 1 of pText
 else
put empty into tCode
delete char 1 to 2 of pText
repeat until char 1 of pText = ";"
   put char 1 of pText after tCode
   delete char 1 of pText
   if pText is empty then exit repeat
end repeat
delete char 1 of pText
put numtochar(tCode) into tChar
set the unicodetext of the templatefield to tChar
put the text of the templatefield after tNew
 end if
  end repeat

  set the useunicode to true

  return tNew
end decodeEntities

Use it like this:put decodeEntities("“Kanwar”")
which returns: “Kanwar” (curly opening & closing quotes which
may not show in the email).

I feel sure that there must be a better method, but until someone
discovers it, this function seems to do the job.

Cheers,
Sarah
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

Re: reading and converting web page HTML text

14 matches

Site Navigation

Mail list logo

Footer information