Producing a SERPS scraper for Google would be very easy and possible in about 10-15 lines of code.  However, its against the Google terms of service and if they decide to bite you for breaching them then you'll be in trouble.  Its also a reason you'll not likely find one that trumpets its existence very much as the site promoting it would probably be taken off the Google index - severely effecting visitors.

On 2/21/06, Gabriel B. <[EMAIL PROTECTED]> wrote:
the google webservices (aka google API) is not even close for any kind
of real use yet

if you search for the same term 10 times, you get 3 mixed totals. 2
mixed result order. and one or two "502 bad gateway"

i did an extensive match agains the API and the regular search
service. the most average set of results:

results 1-10; total: 373000
results 11-20; total: 151000
results 21-30; total: 151000
results 31-40; total: 373000
results 41-50; total: 373000
results 51-60; total: 373000
results 61-70; total: 151000
( 502 bad gateway. retry)
results 71-80; total: 373000
results 81-90; total: 151000
( 502 bad gateway. retry)
results 91-100; total: 373000

on the regular google search, total:  2,050,000 (for every page, of
course)

besides that, the first and third result on the regular google search,
does not apear in the 100 results from the API in this query, but this
is not average, more like 1 chance in 10 :-/

So, no matter how much google insists that this parrot is sleeping,
it's simply dead.


now, what i presume that is happening, is that they have a dozen of
machine pools, and each one has a broken snapshot of the production
index (probably they have some process to import the index and or it
explode in some point or they simply kill it after some time). and
they obviously don't run that process very often.

Now... anyone has some implementation of pygoogle.py that scraps the
regular html service instead of using SOAP? :)

Gabriel B.
--
http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to