Re: Parsing??

Bill Davidson Sat, 16 Dec 2000 17:21:13 -0800
To be really  efficient in doing this, learn regular expressions and use
CF's regular expression functions.  Otherwise you'll be writing a parser
that is bulky to handle many different cases, unless the page you are
formatting is always pretty much the same, but just some different data.
Not knowing REs as well as I should I usually do the stupid thing and write
a parser that is "dumber" -- looking for specific data, rather than data
that fits a rule.  Do yourself favor, and don't follow my lead. ;)

Anyhow....

A general parsing algorithm that can find a value that is a "word" --
without RE, could be something like this (NOTE this is a pseudo-code
algorithm, not at all syntactically correct for CF... its up to you to
interpret it to CF, that's the fun part anyhow...)

.....
/* find the known string's position in the page, using space as the
delimiters */
List_location = FindInList("known_string_data", "page_body", " ")

/* Retrieve the string that lives at the location after the "known page
component */
String_You_were_looking_for  = ("page_body", List_location+1, " ")
.....

That would be useful in a page like:
"Your message was sent on Thursday"

and you were trying to figure out when the message was sent.

If you decide to go this route, pay special attention to the argument order
in which Cold Fusion's functions are expecting to receive them.
Can't remember which one always gets me, but one of the functions or set of
functions (arrays/List/RE) is a little different the rest...
Maybe it's this:

Find(substring, string [, start ])
&
ListContains(list, substring [, delimiters ])

Yeah, that's probably it - Find wants the substring first, and listcontains
wants the substring second... From their naming it kinda makes sense, but
it's an easy thing to forget when you're whipping out code....Maybe someone
has a custom tag function, FindInList --  Allaire really should add this,
since it would increase consistency in this little language we all use &
love/hate/_____. ;)

HTH,

-Bill
/intraget

----- Original Message -----
From: pan <[EMAIL PROTECTED]>
To: CF-Talk <[EMAIL PROTECTED]>
Sent: Saturday, December 16, 2000 12:50 PM
Subject: Re: Parsing??


>
>
> > I have successfully used the cfhttp tag to pull a webpage and then
display
> > it with CFHTTP.FileContent.  But how do I just display specific parts?
> >
> > Rich
>
> Craig already showed you one method.
>
> Basic info is that cfhttp.filecontent is one_big_string.
> You use the various cf string functions to extract
> what you want from one_big_string.
> Regular expressions help a lot (see function list
> in docs).
> When you grab a web page's content with cfhttp
> all the content is stored in the variable cfhttp.filecontent.
> If you need to extract more than one item from
> one_big_string, you might want to copy one_big_string
> to additional variables - one variable for each extraction.
> That keeps you from having to refresh cfhttp.filecontent.
>
> Thus,
>
> <cfset tmp1=cfhttp.filecontent>
> ..... do extractions
> ..... ending with content_1 you want to use
>
> <cfset tmp2=cfhttp.filecontent>
> .... do extractions
> ..... ending with cntent_2 you want to use
>
> repeat
>
> The simplest extraction method is using the
> mid() function. If the web page is static,
> never changes and you count the char positions
> to set the attritbutes of mid(), then you need to
> do nothing more.
>
> A little more complex is using Find() to set
> the beginning and end values of mid(). If the
> web page is dynamic, but the info you want
> always begins with a specific char sequence
> and ends with a specific char sequence, then
> you can use Find() or FindNoCase() to determine
> where to begin and end your mid() extraction.
>
> In some cases you can use the suite of
> List functions to do your extractions.
> In particular when the cfhttp.filecontent is
> a collection of well formed sentences you
> can use chr(32),chr(10),chr(13) as a list
> delimiters and go from there. I would classify
> this as a technique for a special case.
>
> Better methods involve using regular expression
> via the REFind(),REFindNoCase(),REReplace(),
> and REReplaceNoCase() functions along with
> several other string manipulation functions
> and techniques.
>
> Beyond string manipulation  wddx and xml
> can be your solution.
>
> Beyond common packet formation and
> trasmission formats (like xml, wddx)
> there is AI based pattern recognition
> and cognitive artificial ideation, but
> for all practical purposes this amounts
> to nothing more than employing an
> esp() function  :) .
>
> Pan
>
>
>
>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists
Re: Parsing??

Reply via email to