Robin,

Would something like this be what you're looking for?
http://www.mozilla.org/en-US/collusion/

They track the trackers. Maybe the code there will be helpful? I've
not looked into it other than using the plugin.

Aaron

On Thu, Nov 1, 2012 at 10:35 AM, Robin Wood <[email protected]> wrote:
> On 1 November 2012 14:18, Tim Tomes <[email protected]> wrote:
>> Are you trying to do what ewhois.com does with the analytics and adsense IDs?
>
> Kind of, but in a way where you hold the database and tell it which
> sites to index so you are not limited to just what the online sites
> have pre-crawled.
>
> Robin
>
>> I was trying to script the same thing. However, parsing all of that
>> data was a pain considering a developer can implement it several
>> different ways. Sites like ewhois.com must have access to some sort of
>> API for collecting that data. Let me know how it goes, because if you
>> succeed, I'd like to bring you in on a project I am working on.
>>
>>
>> On Thu, Nov 1, 2012 at 5:36 AM, Robin Wood <[email protected]> wrote:
>>> I'm building a tool to scrape websites and pull out tracking codes so
>>> I can see which sites are related based on who is tracking them.
>>>
>>> Google codes are good for this as they identify the tracker not the
>>> site, Woopra tracking identifies the domain not the tracker so there
>>> is no way back to the person/group tracking the site. What other web
>>> tracking systems are out there which can be used to identify the
>>> tracker rather than the site?
>>>
>>> In case that doesn't make sense, this is Woopra code:
>>>
>>> function woopraReady(tracker){
>>>     tracker.setDomain('yourdomain.com');
>>>     tracker.setIdleTimeout(300000);
>>>     tracker.track();
>>> }
>>>
>>> which identifies "yourdomain.com" but this is google
>>>
>>> try {
>>> var pageTracker = _gat._getTracker("UA-7503551-1");
>>> pageTracker._trackPageview();
>>> } catch(err) {}
>>>
>>> which identifies the tracker.
>>>
>>> Robin
>>> _______________________________________________
>>> Pauldotcom mailing list
>>> [email protected]
>>> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
>>> Main Web Site: http://pauldotcom.com
>>
>>
>>
>> --
>> Tim Tomes
>> http://lanmaster53.com/
>> _______________________________________________
>> Pauldotcom mailing list
>> [email protected]
>> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
>> Main Web Site: http://pauldotcom.com
> _______________________________________________
> Pauldotcom mailing list
> [email protected]
> http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
> Main Web Site: http://pauldotcom.com
_______________________________________________
Pauldotcom mailing list
[email protected]
http://mail.pauldotcom.com/cgi-bin/mailman/listinfo/pauldotcom
Main Web Site: http://pauldotcom.com

Reply via email to