Hi, I'm launching a startup that I think a lot of you will be interested in. The company, Webscaled, is a marketplace for datasets from ongoing Web crawls. Soon, I'll offer access to a diverse catalog of fresh, regularly updated datasets.
An example dataset would be the link graph. I am selling the link graph in chunks of 1 billion edges. The dataset includes the source and destination URLs, the anchor and title text of the link, any rel or rev values, and so on. Other datasets include: -Lists of sites of X genre (forums, blogs, ecommerce, etc.) -The top 1,000 [HTML editors, CMSs, forum and blog software, etc.] -Lists of sites using X technology (AdSense, Feedburner chiclet, etc.) -How many sites are using which advertising platform, widget, etc. -Frequency of specific Doctypes and other HTML elements -Analysis of affiliate program usage -Social graph data -Bi- and Tri-gram datasets, word frequency, and other NLP-related datasets extracted from sentences that appear in the content portions of Web documents. -Frequency of namespace/uri correlation in XML and RDF/XML documents -The top 1000 external Javascript includes, including the frameworks -Percentage of sites using Ajax vs not There are many other datasets, and they will be available soon. If you want to learn more about what I'm doing you can join the mailing list at http://webscaled.com/ Thanks for your time, James Simmons Founder, Webscaled http://webscaled.com/ -- Introduce business in 3800 cities. How? ===================================================== Thanks for using group. Moderator 1) This group is managed by www.BRCIndex.blogspot.com 2) Post your message with your linkedin Profile (Visit linkedin.com) to confirm your identity and refer business. 3) Visit www.BRCIndex.blogspot.com (3800 Business Referral Club) 4) Reply to two message before posting new one. 5) Donot post same message in all 3800 business Referral Club. You received this message because you are subscribed to the Google Groups "Outsource" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/Outsource
