Works a treat.

Great stuff to all!!!!!!!!!!!!!



[EMAIL PROTECTED]





From: Paul Cowan <[EMAIL PROTECTED]>
Reply-To: "Discussion of advanced .NET topics."
<[email protected]>
To: [email protected]
Subject: Re: [ADVANCED-DOTNET] HTTP help
Date: Thu, 9 Mar 2006 16:32:00 +0000

Thanks Peter I'll let you know how I get on.

That is a great help!!

Paul



[EMAIL PROTECTED]





From: Peter Ritchie <[EMAIL PROTECTED]>
Reply-To: "Discussion of advanced .NET topics."
<[email protected]>
To: [email protected]
Subject: Re: [ADVANCED-DOTNET] HTTP help
Date: Thu, 9 Mar 2006 11:20:38 -0500

Paul,

There are a couple of problems with the way you're doing it now.  The
first, it doesn't really process the HTML.  For example, if the HTML
contained <!-- <IMG SRC='http://www.google.com/images/about_logo.gif'> -->
you'd get a false positive.  Same for "<TEXTAREA><IMG
SRC='http://www.google.com/images/about_logo.gif'></TEXTAREA>".  Second,
as you noted, you can do nothing with javascript (or VBScript for that
matter).

Since you're using VCS 2005, I would suggest you use a WebBrowser control
(by adding it to your form) and have it navigate to the URL.
        private void Form1_Load(object sender, EventArgs e)
        {
            browser.Navigate(new Uri(@"file://C:\test.html"));
        }

Then, use the Document property to get at an HtmlDocument object that will
have parsed and validated the HTML and scripts.

For example, hook up an event handler for the DocumentCompleted event and
you can iterate the IMG elements:
        private void browser_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
        {
            foreach (HtmlElement element in browser.Document.Images)
            {
                System.Diagnostics.Debug.WriteLine(element.OuterHtml);
            }
        }

If you want to get into the DOM objects you have to add a reference to the
COM Microsoft HTML Object Library (aka "mshtml") to your project.  Once
you do that, you can get at the attributes of the IMG element as follows:

        private void browser_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
        {
            foreach (HtmlElement element in browser.Document.Images)
            {
                mshtml.HTMLImgClass img = element.DomElement as
mshtml.HTMLImgClass;
                if(img != null)
                {
                    System.Diagnostics.Debug.WriteLine(img.href);
                    // TODO: validate href here
                }
            }
        }

That will also allow you to handle xhtml (and the <img/> tag, note the
case).

On Thu, 9 Mar 2006 16:00:54 +0000, Paul Cowan <[EMAIL PROTECTED]> wrote:

>I am currently using Visual Studio 2005
>
>[EMAIL PROTECTED]
>
>
>
>
>>From: Peter Ritchie <[EMAIL PROTECTED]>
>>Reply-To: "Discussion of advanced .NET topics."
>><[email protected]>
>>To: [email protected]
>>Subject: Re: [ADVANCED-DOTNET] HTTP help
>>Date: Thu, 9 Mar 2006 10:33:14 -0500
>>
>>Paul,
>>
>>Are you using Visual Studio 2003 or 2005?
>>
>>
>>On Thu, 9 Mar 2006 14:31:01 +0000, Paul Cowan <[EMAIL PROTECTED]>
wrote:
>>
>> >All,
>> >
>> >I have the following C# code (listed at the end of the message) which
>>does
>> >list all the image tags on a given page request.  The problem is it
does
>>not
>> >list image tags that are created on the client via external .js
files,
>>for
>> >example if the following was in an external .js
>> >
>> >document.writeln("<img src=\"images/136195-gk-shirt-away.jpg\" />");
>> >
>> >I could load each js file but I think I am starting to bark up the
wrong
>> >tree with this approach.  I cannot guarantee if the images have been
>>written
>> >by document.writeln or they have been added via the DOM or the many
>> >otherways there are to do this.
>> >
>> >Can anyone think of a better way.
>> >
>> >public void TestWebRequest()
>> >{
>> >        WebRequest req =
>> >WebRequest.Create("http://localhost:2178/Image/Default.aspx";);
>> >        ServicePointManager.ServerCertificateValidationCallback +=
>> >              delegate(object sender, X509Certificate cert, X509Chain
>>chain,
>> >SslPolicyErrors error)
>> >              {
>> >                  return true;
>> >               };
>> >
>> >            StreamReader sr = new
>> >StreamReader(req.GetResponse().GetResponseStream());
>> >            StringBuilder sb = new StringBuilder();
>> >            string line = string.Empty;
>> >
>> >            while ((line = sr.ReadLine()) != null)
>> >            {
>> >                if (line.Length > 0)
>> >                    sb.Append(line);
>> >            }
>> >
>> >            _site = sb.ToString();
>> >
>> >            Regex r = new Regex("<img[^>]+>");
>> >            MatchCollection mcl = r.Matches(_site);
>> >
>> >            foreach (Match m in mcl)
>> >            {
>> >                foreach (Group g in m.Groups)
>> >                {
>> >                    Console.WriteLine(g.Value);
>> >                }
>> >            }
>> >        }
>> >

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at
http://discuss.develop.com

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to