Robin, Would something like this be what you're looking for? http://www.mozilla.org/en-US/collusion/
They track the trackers. Maybe the code there will be helpful? I've not looked into it other than using the plugin. Aaron On Thu, Nov 1, 2012 at 10:35 AM, Robin Wood <[email protected]> wrote: > On 1 November 2012 14:18, Tim Tomes <[email protected]> wrote: >> Are you trying to do what ewhois.com does with the analytics and adsense IDs? > > Kind of, but in a way where you hold the database and tell it which > sites to index so you are not limited to just what the online sites > have pre-crawled. > > Robin > >> I was trying to script the same thing. However, parsing all of that >> data was a pain considering a developer can implement it several >> different ways. Sites like ewhois.com must have access to some sort of >> API for collecting that data. Let me know how it goes, because if you >> succeed, I'd like to bring you in on a project I am working on. >> >> >> On Thu, Nov 1, 2012 at 5:36 AM, Robin Wood <[email protected]> wrote: >>> I'm building a tool to scrape websites and pull out tracking codes so >>> I can see which sites are related based on who is tracking them. >>> >>> Google codes are good for this as they identify the tracker not the >>> site, Woopra tracking identifies the domain not the tracker so there >>> is no way back to the person/group tracking the site. What other web >>> tracking systems are out there which can be used to identify the >>> tracker rather than the site? >>> >>> In case that doesn't make sense, this is Woopra code: >>> >>> function woopraReady(tracker){ >>> tracker.setDomain('yourdomain.com'); >>> tracker.setIdleTimeout(300000); >>> tracker.track(); >>> } >>> >>> which identifies "yourdomain.com" but this is google >>> >>> try { >>> var pageTracker = _gat._getTracker("UA-7503551-1"); >>> pageTracker._trackPageview(); >>> } catch(err) {} >>> >>> which identifies the tracker. >>> >>> Robin >>> _______________________________________________ >>> Pauldotcom mailing list >>> [email protected] >>> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom >>> Main Web Site: http://pauldotcom.com >> >> >> >> -- >> Tim Tomes >> http://lanmaster53.com/ >> _______________________________________________ >> Pauldotcom mailing list >> [email protected] >> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom >> Main Web Site: http://pauldotcom.com > _______________________________________________ > Pauldotcom mailing list > [email protected] > http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom > Main Web Site: http://pauldotcom.com _______________________________________________ Pauldotcom mailing list [email protected] http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom Main Web Site: http://pauldotcom.com
