On 1 November 2012 16:41, Aaron <[email protected]> wrote: > Robin, > > Would something like this be what you're looking for? > http://www.mozilla.org/en-US/collusion/ > > They track the trackers. Maybe the code there will be helpful? I've > not looked into it other than using the plugin.
I'll have a look through it, thanks. Robin > Aaron > > On Thu, Nov 1, 2012 at 10:35 AM, Robin Wood <[email protected]> wrote: >> On 1 November 2012 14:18, Tim Tomes <[email protected]> wrote: >>> Are you trying to do what ewhois.com does with the analytics and adsense >>> IDs? >> >> Kind of, but in a way where you hold the database and tell it which >> sites to index so you are not limited to just what the online sites >> have pre-crawled. >> >> Robin >> >>> I was trying to script the same thing. However, parsing all of that >>> data was a pain considering a developer can implement it several >>> different ways. Sites like ewhois.com must have access to some sort of >>> API for collecting that data. Let me know how it goes, because if you >>> succeed, I'd like to bring you in on a project I am working on. >>> >>> >>> On Thu, Nov 1, 2012 at 5:36 AM, Robin Wood <[email protected]> wrote: >>>> I'm building a tool to scrape websites and pull out tracking codes so >>>> I can see which sites are related based on who is tracking them. >>>> >>>> Google codes are good for this as they identify the tracker not the >>>> site, Woopra tracking identifies the domain not the tracker so there >>>> is no way back to the person/group tracking the site. What other web >>>> tracking systems are out there which can be used to identify the >>>> tracker rather than the site? >>>> >>>> In case that doesn't make sense, this is Woopra code: >>>> >>>> function woopraReady(tracker){ >>>> tracker.setDomain('yourdomain.com'); >>>> tracker.setIdleTimeout(300000); >>>> tracker.track(); >>>> } >>>> >>>> which identifies "yourdomain.com" but this is google >>>> >>>> try { >>>> var pageTracker = _gat._getTracker("UA-7503551-1"); >>>> pageTracker._trackPageview(); >>>> } catch(err) {} >>>> >>>> which identifies the tracker. >>>> >>>> Robin >>>> _______________________________________________ >>>> Pauldotcom mailing list >>>> [email protected] >>>> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom >>>> Main Web Site: http://pauldotcom.com >>> >>> >>> >>> -- >>> Tim Tomes >>> http://lanmaster53.com/ >>> _______________________________________________ >>> Pauldotcom mailing list >>> [email protected] >>> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom >>> Main Web Site: http://pauldotcom.com >> _______________________________________________ >> Pauldotcom mailing list >> [email protected] >> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom >> Main Web Site: http://pauldotcom.com > _______________________________________________ > Pauldotcom mailing list > [email protected] > http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom > Main Web Site: http://pauldotcom.com _______________________________________________ Pauldotcom mailing list [email protected] http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom Main Web Site: http://pauldotcom.com
