Re: Another RegEx question...
> "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")> In addition to Ben's comment, it doesn't look like you're finding the beginning of the src attribute, see: ]+ and then straight into the regex for the url ... so there's no mention of src=" ... there's also no allowance for the protocol (https?://) and the : in (?:jpg| seems erroneous... I might suggest a slightly different "tac" ... instead of trying to get it with one expression, how about this? ]+>).*$","\1")> ","/>")> Usually an image tag doesn't contain any characters that aren't xml safe, since it's just got the src (no url parameters usually), class, style and potentially an alt or title tag. It's really the alt or title tag that might potentially have non-xml safe characters, but still not very often. The 2nd rereplace checks to see if the image has an xml closing and if not then it closes it so that the xmlparse() will work properly. Of course, if you don't want to do the xml thing, once you have the tag by itself, then it's a lot easier to just get the src attribute via regex,i.e. Although that runs some other potential risks of getting content from a site that's using xml where the author used single-quotes instead of doublequotes (minor change to the above regex I admit). I'm just saying, 6 of 1, 1/2-doz of the other there are a number of potential points of failure either way. hth, ike -- s. isaac dealey ^ new epoch isn't it time for a change? ph: 503.236.3691 http://onTap.riaforge.org ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295227 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Another RegEx question...
>>Claude, that brings back the javascript script tag Ah ok, then you have to include the >I just want the RegEx though. I wrote this tag because it is sometimes much easier to find what's between two very simple expressions than to describe what you want in only one more complicated expression, and I was tired of designing complex expressions. So now, do not count on me to do it for others ;-) -- ___ REUSE CODE! Use custom tags; See http://www.contentbox.com/claude/customtags/tagstore.cfm (Please send any spam to this address: [EMAIL PROTECTED]) Thanks. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295219 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: Another RegEx question...
Claude, that brings back the javascript script tag. Neat custom tag, I just want the RegEx though. -Original Message- From: Claude Schneegans [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 3:36 PM To: CF-Talk Subject: Re: Another RegEx question... >>I am cfhttp-ing a page and searching for the first image tag. This is a job for CF_REextract : http://www.contentbox.com/claude/customtags/REextract/testREextract.cfm You can even test it on line : 1) set INPUTMODE = to http 2) go to http://www.contentbox.com/claude/customtags/REextract/testingREextract.cfm 3) enter src=" in RE1 4) enter " in RE2 5) set EXTRACT = to first 6) enter the address of your page in the URL field click on Test, ... et voilà ! -- ___ REUSE CODE! Use custom tags; See http://www.contentbox.com/claude/customtags/tagstore.cfm (Please send any spam to this address: [EMAIL PROTECTED]) Thanks. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295213 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Another RegEx question...
>>I am cfhttp-ing a page and searching for the first image tag. This is a job for CF_REextract : http://www.contentbox.com/claude/customtags/REextract/testREextract.cfm You can even test it on line : 1) set INPUTMODE = to http 2) go to http://www.contentbox.com/claude/customtags/REextract/testingREextract.cfm 3) enter src=" in RE1 4) enter " in RE2 5) set EXTRACT = to first 6) enter the address of your page in the URL field click on Test, ... et voilà ! -- ___ REUSE CODE! Use custom tags; See http://www.contentbox.com/claude/customtags/tagstore.cfm (Please send any spam to this address: [EMAIL PROTECTED]) Thanks. ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295212 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Another RegEx question...
I think you need ReFindNoCase, otherwise it will return empty string. -- Josh - Original Message - From: "Todd" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Thursday, December 20, 2007 11:58 AM Subject: Re: Another RegEx question... > listLast(URL,'/') ? You're just complicating something simple with Regex > here in this case. > > On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote: > >> In a sea of data, I need to pull the first image tag. It looks like >> this... >> > src=" >> http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d >> 2f00d0d9.jpg"> >> >> I am using... >> > "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")> >> >> But it does not work. Can anyone help? Thanks, Che >> >> > > > ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295210 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Re: Another RegEx question...
The first thing I see is that you are only allowing for single-character names. You don't have a + or * (+ would be better) after [a-z0-9_]. There may be more, but that's what I see at first glance. --Ben Doom Che Vilnonis wrote: > In a sea of data, I need to pull the first image tag. It looks like this... > src="http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d > 2f00d0d9.jpg"> > > I am using... > "]+(\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")> > > But it does not work. Can anyone help? Thanks, Che > > > ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295208 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
RE: Another RegEx question...
Todd, not sure that that would work. If you look at my code, I am cfhttp-ing a page and searching for the first image tag. I simply gave an example of the format of the in the returned HTML. RegEx would be best for that, right? -Original Message- From: Todd [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 2:58 PM To: CF-Talk Subject: Re: Another RegEx question... listLast(URL,'/') ? You're just complicating something simple with Regex here in this case. On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote: > In a sea of data, I need to pull the first image tag. It looks like > this... > src=" > http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712 > d > 2f00d0d9.jpg"> > > I am using... > "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")> > > But it does not work. Can anyone help? Thanks, Che > > ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295207 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: Another RegEx question...
listLast(URL,'/') ? You're just complicating something simple with Regex here in this case. On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote: > In a sea of data, I need to pull the first image tag. It looks like > this... > src=" > http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d > 2f00d0d9.jpg"> > > I am using... > "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")> > > But it does not work. Can anyone help? Thanks, Che > > ~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;160198600;22374440;w Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295206 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
RE: Another RegEx question
This snippet will parse a variable called file and create a list with all URLs called lURL (pipe delimited) regexp = 'href[[:space:]]*=[[:space:]]*"[^"]"'; cnt = REFindNoCase(regexp,file,1,"yes"); while (cnt.pos[1] GT 0) { text = mid(tmpvar,cnt.pos[2],cnt.len[2]); text = ListLast(text,"/"); // the next if is only required if you dont want duplicate urls if(NOT ListFind(lURL,text,"|")) lURL = ListAppend(lURL,text,"|"); cnt = REFindNoCase(variables.regexp,tmpvar,cnt.pos[2],"yes"); } -Original Message- From: Shawn Grover [mailto:[EMAIL PROTECTED]] Sent: dinsdag 29 januari 2002 19:10 To: CF-Talk Subject: Another RegEx question I find myself needing to find all the HREF values on a given page. Regular Expressions are the answer, but I haven't nailed the pattern yet. __ Get Your Own Dedicated Windows 2000 Server PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER Instant Activation · $99/Month · Free Setup http://www.pennyhost.com/redirect.cfm?adcode=coldfusionb FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists
RE: Another RegEx question
Ah, well then try this. Rereplacenocase(string, "href=""([^""]*)""", "\1", "ALL") I'm not sure if you have to escape the quotations with a \ or not. But if so it would be like this. Rereplacenocase(string, "href=\""([^\""]*)\""", "\1", "ALL") __ steve oliver cresco technologies, inc. http://www.crescotech.com -Original Message- From: Shawn Grover [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 3:43 PM To: CF-Talk Subject: RE: Another RegEx question Thanks, but if I have an anchor tag in this format, it won't return the HREF alone http://someplace.com/page.cfm"; name="SomeName"> Your regular expression will include the name attribute as well. -Original Message- From: Steve Oliver [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 11:20 AM To: CF-Talk Subject: RE: Another RegEx question If you just want the value of href, do something like: Rereplacenocase(string, "]*)>", "\1", "ALL") __ steve oliver cresco technologies, inc. http://www.crescotech.com -Original Message- From: Shawn Grover [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 1:10 PM To: CF-Talk Subject: Another RegEx question I find myself needing to find all the HREF values on a given page. Regular Expressions are the answer, but I haven't nailed the pattern yet. Any help??? I've tried the following: "\w+:\/\/[^/:]+:\d*?[^# ]*" "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]" "href=\S.*[\S|>]" They tend to give me more than just the href value, like the next 3 tags and text between em. The first two patterns are directly from a microsoft article on this (I'm working in VB too), with the first being the same as the second but no __ Dedicated Windows 2000 Server PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER Instant Activation · $99/Month · Free Setup http://www.pennyhost.com/redirect.cfm?adcode=coldfusiona FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists
RE: Another RegEx question
Thanks, but if I have an anchor tag in this format, it won't return the HREF alone http://someplace.com/page.cfm"; name="SomeName"> Your regular expression will include the name attribute as well. -Original Message- From: Steve Oliver [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 11:20 AM To: CF-Talk Subject: RE: Another RegEx question If you just want the value of href, do something like: Rereplacenocase(string, "]*)>", "\1", "ALL") __ steve oliver cresco technologies, inc. http://www.crescotech.com -Original Message- From: Shawn Grover [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 1:10 PM To: CF-Talk Subject: Another RegEx question I find myself needing to find all the HREF values on a given page. Regular Expressions are the answer, but I haven't nailed the pattern yet. Any help??? I've tried the following: "\w+:\/\/[^/:]+:\d*?[^# ]*" "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]" "href=\S.*[\S|>]" They tend to give me more than just the href value, like the next 3 tags and text between em. The first two patterns are directly from a microsoft article on this (I'm working in VB too), with the first being the same as the second but no __ Dedicated Windows 2000 Server PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER Instant Activation · $99/Month · Free Setup http://www.pennyhost.com/redirect.cfm?adcode=coldfusiona FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists
RE: Another RegEx question
If you just want the value of href, do something like: Rereplacenocase(string, "]*)>", "\1", "ALL") __ steve oliver cresco technologies, inc. http://www.crescotech.com -Original Message- From: Shawn Grover [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 29, 2002 1:10 PM To: CF-Talk Subject: Another RegEx question I find myself needing to find all the HREF values on a given page. Regular Expressions are the answer, but I haven't nailed the pattern yet. Any help??? I've tried the following: "\w+:\/\/[^/:]+:\d*?[^# ]*" "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]" "href=\S.*[\S|>]" They tend to give me more than just the href value, like the next 3 tags and text between em. The first two patterns are directly from a microsoft article on this (I'm working in VB too), with the first being the same as the second but no __ Why Share? Dedicated Win 2000 Server · PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER Instant Activation · $99/Month · Free Setup http://www.pennyhost.com/redirect.cfm?adcode=coldfusionc FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists