Thanks Peter I'll let you know how I get on. That is a great help!!
Paul [EMAIL PROTECTED]
From: Peter Ritchie <[EMAIL PROTECTED]> Reply-To: "Discussion of advanced .NET topics." <[email protected]> To: [email protected] Subject: Re: [ADVANCED-DOTNET] HTTP help Date: Thu, 9 Mar 2006 11:20:38 -0500 Paul, There are a couple of problems with the way you're doing it now. The first, it doesn't really process the HTML. For example, if the HTML contained <!-- <IMG SRC='http://www.google.com/images/about_logo.gif'> --> you'd get a false positive. Same for "<TEXTAREA><IMG SRC='http://www.google.com/images/about_logo.gif'></TEXTAREA>". Second, as you noted, you can do nothing with javascript (or VBScript for that matter). Since you're using VCS 2005, I would suggest you use a WebBrowser control (by adding it to your form) and have it navigate to the URL. private void Form1_Load(object sender, EventArgs e) { browser.Navigate(new Uri(@"file://C:\test.html")); } Then, use the Document property to get at an HtmlDocument object that will have parsed and validated the HTML and scripts. For example, hook up an event handler for the DocumentCompleted event and you can iterate the IMG elements: private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { foreach (HtmlElement element in browser.Document.Images) { System.Diagnostics.Debug.WriteLine(element.OuterHtml); } } If you want to get into the DOM objects you have to add a reference to the COM Microsoft HTML Object Library (aka "mshtml") to your project. Once you do that, you can get at the attributes of the IMG element as follows: private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { foreach (HtmlElement element in browser.Document.Images) { mshtml.HTMLImgClass img = element.DomElement as mshtml.HTMLImgClass; if(img != null) { System.Diagnostics.Debug.WriteLine(img.href); // TODO: validate href here } } } That will also allow you to handle xhtml (and the <img/> tag, note the case). On Thu, 9 Mar 2006 16:00:54 +0000, Paul Cowan <[EMAIL PROTECTED]> wrote: >I am currently using Visual Studio 2005 > >[EMAIL PROTECTED] > > > > >>From: Peter Ritchie <[EMAIL PROTECTED]> >>Reply-To: "Discussion of advanced .NET topics." >><[email protected]> >>To: [email protected] >>Subject: Re: [ADVANCED-DOTNET] HTTP help >>Date: Thu, 9 Mar 2006 10:33:14 -0500 >> >>Paul, >> >>Are you using Visual Studio 2003 or 2005? >> >> >>On Thu, 9 Mar 2006 14:31:01 +0000, Paul Cowan <[EMAIL PROTECTED]> wrote: >> >> >All, >> > >> >I have the following C# code (listed at the end of the message) which >>does >> >list all the image tags on a given page request. The problem is it does >>not >> >list image tags that are created on the client via external .js files, >>for >> >example if the following was in an external .js >> > >> >document.writeln("<img src=\"images/136195-gk-shirt-away.jpg\" />"); >> > >> >I could load each js file but I think I am starting to bark up the wrong >> >tree with this approach. I cannot guarantee if the images have been >>written >> >by document.writeln or they have been added via the DOM or the many >> >otherways there are to do this. >> > >> >Can anyone think of a better way. >> > >> >public void TestWebRequest() >> >{ >> > WebRequest req = >> >WebRequest.Create("http://localhost:2178/Image/Default.aspx"); >> > ServicePointManager.ServerCertificateValidationCallback += >> > delegate(object sender, X509Certificate cert, X509Chain >>chain, >> >SslPolicyErrors error) >> > { >> > return true; >> > }; >> > >> > StreamReader sr = new >> >StreamReader(req.GetResponse().GetResponseStream()); >> > StringBuilder sb = new StringBuilder(); >> > string line = string.Empty; >> > >> > while ((line = sr.ReadLine()) != null) >> > { >> > if (line.Length > 0) >> > sb.Append(line); >> > } >> > >> > _site = sb.ToString(); >> > >> > Regex r = new Regex("<img[^>]+>"); >> > MatchCollection mcl = r.Matches(_site); >> > >> > foreach (Match m in mcl) >> > { >> > foreach (Group g in m.Groups) >> > { >> > Console.WriteLine(g.Value); >> > } >> > } >> > } >> > =================================== This list is hosted by DevelopMentorĀ® http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com
=================================== This list is hosted by DevelopMentorĀ® http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com
