Re: Another RegEx question...

2007-12-20 Thread s. isaac dealey
>  "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")>

In addition to Ben's comment, it doesn't look like you're finding the
beginning of the src attribute, see: 

]+ and then straight into the regex for the url ... so there's no
mention of src=" ... there's also no allowance for the protocol (https?://)
and the : in (?:jpg| seems erroneous... 

I might suggest a slightly different "tac" ... instead of trying to get
it with one expression, how about this? 

]+>).*$","\1")>
","/>")>



Usually an image tag doesn't contain any characters that aren't xml safe,
since it's just got the src (no url parameters usually), class, style
and potentially an alt or title tag. It's really the alt or title tag
that might potentially have non-xml safe characters, but still not very
often. The 2nd rereplace checks to see if the image has an xml closing
and if not then it closes it so that the xmlparse() will work properly.
Of course, if you don't want to do the xml thing, once you have the tag
by itself, then it's a lot easier to just get the src attribute via
regex,i.e. 



Although that runs some other potential risks of getting content from a
site that's using xml where the author used single-quotes instead of
doublequotes (minor change to the above regex I admit). I'm just saying,
6 of 1, 1/2-doz of the other there are a number of potential points of
failure either way. 

hth,
ike

-- 
s. isaac dealey  ^  new epoch
 isn't it time for a change? 
 ph: 503.236.3691

http://onTap.riaforge.org



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295227
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Another RegEx question...

2007-12-20 Thread Claude Schneegans
 >>Claude, that brings back the javascript script tag

Ah ok, then you have to include the >I just want the RegEx though.

I wrote this tag because it is sometimes much easier to find what's 
between two very simple expressions
than to describe what you want in only one more complicated expression,
and I was tired of designing complex expressions.

So now, do not count on me to do it for others ;-)

-- 
___
REUSE CODE! Use custom tags;
See http://www.contentbox.com/claude/customtags/tagstore.cfm
(Please send any spam to this address: [EMAIL PROTECTED])
Thanks.


~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295219
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


RE: Another RegEx question...

2007-12-20 Thread Che Vilnonis
Claude, that brings back the javascript script tag. Neat custom tag, I just
want the RegEx though.

-Original Message-
From: Claude Schneegans [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 20, 2007 3:36 PM
To: CF-Talk
Subject: Re: Another RegEx question...

 >>I am cfhttp-ing a page and searching for the first image tag.

This is a job for CF_REextract :
http://www.contentbox.com/claude/customtags/REextract/testREextract.cfm

You can even test it on line :
1) set INPUTMODE = to http
2) go to
http://www.contentbox.com/claude/customtags/REextract/testingREextract.cfm
3) enter src=" in RE1
4) enter " in RE2
5) set EXTRACT = to first
6) enter the address of your page in the URL field click on Test, ... et
voilà !

--
___
REUSE CODE! Use custom tags;
See http://www.contentbox.com/claude/customtags/tagstore.cfm
(Please send any spam to this address: [EMAIL PROTECTED]) Thanks.





~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295213
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Another RegEx question...

2007-12-20 Thread Claude Schneegans
 >>I am cfhttp-ing a page and searching for the first image tag.

This is a job for CF_REextract :
http://www.contentbox.com/claude/customtags/REextract/testREextract.cfm

You can even test it on line :
1) set INPUTMODE = to http
2) go to 
http://www.contentbox.com/claude/customtags/REextract/testingREextract.cfm
3) enter src=" in RE1
4) enter " in RE2
5) set EXTRACT = to first
6) enter the address of your page in the URL field
click on Test, ... et voilà !

-- 
___
REUSE CODE! Use custom tags;
See http://www.contentbox.com/claude/customtags/tagstore.cfm
(Please send any spam to this address: [EMAIL PROTECTED])
Thanks.



~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295212
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Another RegEx question...

2007-12-20 Thread Josh Nathanson
I think you need ReFindNoCase, otherwise it will return empty string.

-- Josh

- Original Message - 
From: "Todd" <[EMAIL PROTECTED]>
To: "CF-Talk" 
Sent: Thursday, December 20, 2007 11:58 AM
Subject: Re: Another RegEx question...


> listLast(URL,'/') ?  You're just complicating something simple with Regex
> here in this case.
>
> On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote:
>
>> In a sea of data, I need to pull the first image tag. It looks like
>> this...
>> > src="
>> http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d
>> 2f00d0d9.jpg">
>>
>> I am using...
>> > "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")>
>>
>> But it does not work. Can anyone help? Thanks, Che
>>
>>
>
>
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295210
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


Re: Another RegEx question...

2007-12-20 Thread Ben Doom
The first thing I see is that you are only allowing for single-character 
names.  You don't have a + or * (+ would be better) after [a-z0-9_]. 
There may be more, but that's what I see at first glance.

--Ben Doom


Che Vilnonis wrote:
> In a sea of data, I need to pull the first image tag. It looks like this...
>  src="http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d
> 2f00d0d9.jpg">
> 
> I am using...
>  "]+(\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")>
> 
> But it does not work. Can anyone help? Thanks, Che
> 
> 
> 

~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295208
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


RE: Another RegEx question...

2007-12-20 Thread Che Vilnonis
Todd, not sure that that would work. If you look at my code, I am cfhttp-ing
a page and searching for the first image tag. I simply gave an example of
the format of the  in the returned HTML. RegEx would be best for
that, right? 

-Original Message-
From: Todd [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 20, 2007 2:58 PM
To: CF-Talk
Subject: Re: Another RegEx question...

listLast(URL,'/') ?  You're just complicating something simple with Regex
here in this case.

On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote:

> In a sea of data, I need to pull the first image tag. It looks like 
> this...
>  src="
> http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712
> d
> 2f00d0d9.jpg">
>
> I am using...
>  "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")>
>
> But it does not work. Can anyone help? Thanks, Che
>
>




~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295207
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4


Re: Another RegEx question...

2007-12-20 Thread Todd
listLast(URL,'/') ?  You're just complicating something simple with Regex
here in this case.

On Dec 20, 2007 2:55 PM, Che Vilnonis <[EMAIL PROTECTED]> wrote:

> In a sea of data, I need to pull the first image tag. It looks like
> this...
>  src="
> http://images.craigslist.org/01010001150701030720071219cd6f3ea36b5b712d
> 2f00d0d9.jpg">
>
> I am using...
>  "]+([a-z0-9_]\.(?:jpg|jpeg|gif|png))[^>]*>", "", "ONE")>
>
> But it does not work. Can anyone help? Thanks, Che
>
>


~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;160198600;22374440;w

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:295206
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4


RE: Another RegEx question

2002-01-30 Thread Pascal Peters

This snippet will parse a variable called file and create a list with
all URLs called lURL (pipe delimited)


regexp = 'href[[:space:]]*=[[:space:]]*"[^"]"';
cnt = REFindNoCase(regexp,file,1,"yes");
while (cnt.pos[1] GT 0) {
text = mid(tmpvar,cnt.pos[2],cnt.len[2]);
text = ListLast(text,"/");
// the next if is only required if you dont want duplicate urls
if(NOT ListFind(lURL,text,"|"))
lURL = ListAppend(lURL,text,"|");
cnt = REFindNoCase(variables.regexp,tmpvar,cnt.pos[2],"yes");
}


-Original Message-
From: Shawn Grover [mailto:[EMAIL PROTECTED]]
Sent: dinsdag 29 januari 2002 19:10
To: CF-Talk
Subject: Another RegEx question


I find myself needing to find all the HREF values on a given page.
Regular
Expressions are the answer, but I haven't nailed the pattern yet.
__
Get Your Own Dedicated Windows 2000 Server
  PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER
  Instant Activation · $99/Month · Free Setup
  http://www.pennyhost.com/redirect.cfm?adcode=coldfusionb
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists



RE: Another RegEx question

2002-01-29 Thread Steve Oliver

Ah, well then try this.

Rereplacenocase(string, "href=""([^""]*)""", "\1", "ALL")

I'm not sure if you have to escape the quotations with a \ or not. But
if so it would be like this.

Rereplacenocase(string, "href=\""([^\""]*)\""", "\1", "ALL")
__
steve oliver
cresco technologies, inc.
http://www.crescotech.com


-Original Message-
From: Shawn Grover [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 29, 2002 3:43 PM
To: CF-Talk
Subject: RE: Another RegEx question


Thanks, but if I have an anchor tag in this format, it won't return the 
HREF
alone
http://someplace.com/page.cfm"; name="SomeName">

Your regular expression will include the name attribute as well.


-Original Message-
From: Steve Oliver [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 29, 2002 11:20 AM
To: CF-Talk
Subject: RE: Another RegEx question


If you just want the value of href, do something like:

Rereplacenocase(string, "]*)>", "\1", "ALL")

__
steve oliver
cresco technologies, inc.
http://www.crescotech.com


-Original Message-
From: Shawn Grover [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 29, 2002 1:10 PM
To: CF-Talk
Subject: Another RegEx question


I find myself needing to find all the HREF values on a given page.
Regular
Expressions are the answer, but I haven't nailed the pattern yet.

Any help???

I've tried the following:
   "\w+:\/\/[^/:]+:\d*?[^# ]*"
   "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]"
   "href=\S.*[\S|>]"

They tend to give me more than just the href value, like the next 3 
tags
and
text between em.
The first two patterns are directly from a microsoft article on this
(I'm
working in VB too), with the first being the same as the second but no



__
Dedicated Windows 2000 Server
  PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER
  Instant Activation · $99/Month · Free Setup
  http://www.pennyhost.com/redirect.cfm?adcode=coldfusiona
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists



RE: Another RegEx question

2002-01-29 Thread Shawn Grover

Thanks, but if I have an anchor tag in this format, it won't return the 
HREF
alone
http://someplace.com/page.cfm"; name="SomeName">

Your regular expression will include the name attribute as well.


-Original Message-
From: Steve Oliver [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 29, 2002 11:20 AM
To: CF-Talk
Subject: RE: Another RegEx question


If you just want the value of href, do something like:

Rereplacenocase(string, "]*)>", "\1", "ALL")

__
steve oliver
cresco technologies, inc.
http://www.crescotech.com


-Original Message-
From: Shawn Grover [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 29, 2002 1:10 PM
To: CF-Talk
Subject: Another RegEx question


I find myself needing to find all the HREF values on a given page.
Regular
Expressions are the answer, but I haven't nailed the pattern yet.

Any help???

I've tried the following:
   "\w+:\/\/[^/:]+:\d*?[^# ]*"
   "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]"
   "href=\S.*[\S|>]"

They tend to give me more than just the href value, like the next 3 
tags
and
text between em.
The first two patterns are directly from a microsoft article on this
(I'm
working in VB too), with the first being the same as the second but no


__
Dedicated Windows 2000 Server
  PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER
  Instant Activation · $99/Month · Free Setup
  http://www.pennyhost.com/redirect.cfm?adcode=coldfusiona
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists



RE: Another RegEx question

2002-01-29 Thread Steve Oliver

If you just want the value of href, do something like:

Rereplacenocase(string, "]*)>", "\1", "ALL")

__
steve oliver
cresco technologies, inc.
http://www.crescotech.com


-Original Message-
From: Shawn Grover [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 29, 2002 1:10 PM
To: CF-Talk
Subject: Another RegEx question


I find myself needing to find all the HREF values on a given page.
Regular
Expressions are the answer, but I haven't nailed the pattern yet.

Any help???

I've tried the following:
   "\w+:\/\/[^/:]+:\d*?[^# ]*"
   "(\w+)://([^/:]+)(:\d+)?/(.*)[\s|>]"
   "href=\S.*[\S|>]"

They tend to give me more than just the href value, like the next 3 tags
and
text between em.
The first two patterns are directly from a microsoft article on this
(I'm
working in VB too), with the first being the same as the second but no

__
Why Share?
  Dedicated Win 2000 Server · PIII 800 / 256 MB RAM / 40 GB HD / 20 GB MO/XFER
  Instant Activation · $99/Month · Free Setup
  http://www.pennyhost.com/redirect.cfm?adcode=coldfusionc
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists