Hello All, I have just started working on a project involved with dynamic web-scraping (getting data from webpages) in situtations where there is no defined XML interface. This involves the inspection of several complex sites that use javascript to generate the data that needs to be scraped. User interation is also critical as in some cases the user clicking a button causes javascript to be executed.
Thes structure and content of such site often change, so essentially, I need to build a system that allows a user to easilly define a webscraping tasks without any programming. However to do this I must obtain the final DOM of web pages after the application of javascript and user input. I was hoping to use Mozilla/Gecko/WebClient to generate the DOM for a specified target URL and then somehow pass it to Java. Is this possible using Mozilla technologies? Is anyone aware of the status of WebClient, and is this project continuing? Are there any other interesting pointers, experiences or ideas on this topic? Thanks for all your time, Tim _______________________________________________ mozilla-embedding mailing list [EMAIL PROTECTED] http://mail.mozilla.org/listinfo/mozilla-embedding
