Re: THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE
gen_tricomi wrote: > THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE > > > > I write here to make a request on behalf of all the programmers on > earth who have been or are intending to use the Google web search API > for either research purposes or for the development of real world > applications, that Google make their indexes downloadable. > Frankly I doubt whether the average programmer possesses sufficient storage or has access to sufficient bandwidth to make downloading the Google indexes a practical proposition - let alone the cached page contents too. There's also the tiny factoid that Google might regard their index structure, not to mention its contents, as proprietary. Finally, how frequently would you propose to update your local copy? Google is adding the results of new spidering to their indices all the time. A nice idea, perhaps, but surely completely impractical. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Love me, love my blog http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE
gen_tricomi wrote: > Currently application programmers using the Google web search API are > limited to 1000 queries a day. This on the one hand is a reasonable > decision by Google because; limiting the queries will prevent harm on > the Google system by unnecessary automated queries; but it is also > limiting us programmers severely. The query limit limits the usefulness > of whatever applications we decide to craft out and even limits our > imagination on what is possible with a handful of indexes. > If you know which sites you are interested in searching then you can already license hardware from Google which will let you index up to 15 million documents and an api to search the indexed content without restriction. If you simply want to compete with Google on searching the entire internet then they are unlikely to want to help you. If you fall somewhere in between what can be done with a Google Search Appliance and competing with Google then you are talking about paying Google sufficient money that they ought to be interested in sitting round a table with you. As it says on their website: "For larger deployments, contact us and well be happy to talk to you about building a custom search solution for your environment." -- http://mail.python.org/mailman/listinfo/python-list
THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE
THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE I write here to make a request on behalf of all the programmers on earth who have been or are intending to use the Google web search API for either research purposes or for the development of real world applications, that Google make their indexes downloadable. Currently application programmers using the Google web search API are limited to 1000 queries a day. This on the one hand is a reasonable decision by Google because; limiting the queries will prevent harm on the Google system by unnecessary automated queries; but it is also limiting us programmers severely. The query limit limits the usefulness of whatever applications we decide to craft out and even limits our imagination on what is possible with a handful of indexes. Firstly, I will commend the Google Corporation for opening their preciously crawled indexes. This is a great service to humanity and especially to the band of programmers who are interested in epistemology and are using the Google web search API to enable them achieve their goals. Google would be doing another great service for us if they would make their indexes downloadable to programmers with a good interface for programmatically accessing the indexes. The advantages of the above approach would be: 1. Decentralizing the Google system. 2. Reducing the overhead of queries on Google from programmers. 3. Enabling programmers to craft out applications that run on their local systems (only requiring internet connection when a web page is needed since the links return on a query are the most important in the result set) thus enabling them have unlimited number of queries should these applications go public. 4. Give Google the competing edge in search engine technology and user satisfaction by gaining programmer loyalty. 5. Encouraging the global adoption and use of the API + INDEXES provided by Google. 6. Another good thing may be here for Google if they create mechanisms in the downloaded INDEXES + API that enable programmers update the indexes from the web. An agreement can now be made that Google will have unlimited access to the indexes whenever the user's computer is online and IDLE. So Google update its own indexes from the ones on various programmers' local machines. Thereby building a truly distributed global crawler. This can be achieved using grid technologies thereby possibly cutting down the 300year range for crawling the world's crawlable information. Google may still enforce their terms of service by enforcing some kind of authentication for the use of index already residing on the programmer's local machine. Though it may not require that the programmer be on the internet every time he/she wishes to access the system; since the programmer may wish to tinker with the API and indexes locally without requiring an internet connection. Online authentication may be required anytime the user gets online. The non-commerciability of the indexes must be emphasized through several schemes. The Google API can be a tool for epistemological engineers to craft future Infowares (Information Applications). The most important thing in the indexes is the links to resources that are returned on queries. 2 versions of the API + INDEXES can be made available. 1. The one without cached pages attached. So that on querying the API on the local machine with the locally stored indexes, the results are like those on the regular internet API result set. 2. The one with the cached pages. This one is optional as it will be large in size. If you people were good enough to release your API's publicly then you would also consider this request. It would be good if the API + INDEX download is accessible by programmers who program in the following languages: (a) Python (b) Java (c) Perl (d) Ruby Or some language independent mechanism can be formulated so programmers in various languages can access the API + INDEX download. Page Rank may or may not be included in the package depending on decisions at Google. It may also be closed source / open source / or partial source (part open part closed). This will be a great service to humanity and to programmers especially. Thanks, Ogah Ejini, Nigeria, West Africa. Mobile: +234 802 601 5061 -- http://mail.python.org/mailman/listinfo/python-list