Re: Makin search on the other site and getting data and writing in xml

2006-10-06 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Paul
Boddie wrote:

 Various sites forbid wget and friends as a rule, understandably ...

No, that is not understandable.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Steve Holden
Lawrence D'Oliveiro wrote:
 In message [EMAIL PROTECTED], Steve
 Holden wrote:
 
 
The fact remains that Google can chop your searching ability off at the
knees ...
 
 
 No they can't. They can only chop off your ability to use Google.
 
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Ben Finney
Steve Holden [EMAIL PROTECTED] writes:

 Lawrence D'Oliveiro wrote:
  Steve Holden wrote:
 The fact remains that Google can chop your searching ability off
 at the knees ...
  No they can't. They can only chop off your ability to use Google.
  
 [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

Seems like a fairly important distinction. Google has the power to
chop your searching ability off at the knees only to the extent that
you grant them that power.

-- 
 \  [...] a Microsoft Certified System Engineer is to information |
  `\ technology as a McDonalds Certified Food Specialist is to the |
_o__)culinary arts.  -- Michael Bacarella |
Ben Finney

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Ben Finney
wrote:

 Steve Holden [EMAIL PROTECTED] writes:
 
 Lawrence D'Oliveiro wrote:
  Steve Holden wrote:
 The fact remains that Google can chop your searching ability off
 at the knees ...
  No they can't. They can only chop off your ability to use Google.
  
 [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.
 
 Seems like a fairly important distinction. Google has the power to
 chop your searching ability off at the knees only to the extent that
 you grant them that power.

Saying search when you mean Google is like saying using a PC when you
mean using Microsoft Windows.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Steve Holden
Lawrence D'Oliveiro wrote:
 In message [EMAIL PROTECTED], Ben Finney
 wrote:
 
 
Steve Holden [EMAIL PROTECTED] writes:


Lawrence D'Oliveiro wrote:

Steve Holden wrote:

The fact remains that Google can chop your searching ability off
at the knees ...

No they can't. They can only chop off your ability to use Google.


[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

Seems like a fairly important distinction. Google has the power to
chop your searching ability off at the knees only to the extent that
you grant them that power.
 
 
 Saying search when you mean Google is like saying using a PC when you
 mean using Microsoft Windows.

Well, I thought it was self-evident that since I was referring to Google 
I wasn't talking about Alta Vista searching. If I said Microsoft have 
the ability to terminate your license presumably you'd chastise me by 
pointing out that they wouldn't be able to revoke my *Linux* license. 
Whatever.

There's none as thick as them that wants to be.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread altemurbugra
ok i close this discussion
i understand everybody no problem

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Stefan Behnel
[EMAIL PROTECTED] wrote:
 ok i close this discussion

No, you don't.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-27 Thread Paul Boddie
George Sakkis wrote:
 [EMAIL PROTECTED] wrote:

  I dont mean google
  i dont mean onelook.com
 
  these are only examples
 
  i hop eyou understand what i mean

 Apparently, *you* don't understand what they're trying to tell you. It
 roughly boils down to the following:

If we just step back from the brink for a moment and give the
questioner the benefit of the doubt - that the exercise merely involves
automating some kind of interactions that would otherwise require lots
of manual messing around piloting a browser, rather than performing
some kind of bulk suck down of an entire site's information - then it
is obviously possible to use the following techniques:

  * Use a well-known mirroring or archiving tool such as wget.
  * Use various testing tools, some of which are written in Python.
  * Use urllib, urllib2 or httplib plus an HTML or XML parser in your
own program.
  * Automate a Web browser using some off-the-shelf program.
  * Use various automation mechanisms provided by your environment
(eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1],
KPart Plugins [2]).

Various sites forbid wget and friends as a rule, understandably, but
there are sometimes reasons why you might want to use various tools to
automate a procedure involving lots of data which would waste a huge
amount of time if done manually. Perhaps you might have mail residing
in a Webmail system which can't be extracted via any process other than
reading all the messages in a browser, for example, or perhaps your
favourite Internet applications don't provide decent shortcuts to the
information you need, instead believing that it's all about the
experience: surfing around watching all the animated adverts.
Automation and related technologies can legitimately help users regain
control of their Internet-resident data and make better use of the
services around it.

Paul

[1] http://pamie.sourceforge.net/
[2] http://www.boddie.org.uk/python/kpartplugins.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread Fredrik Lundh
Steven D'Aprano wrote:

 Google don't define automated queryit, and I don't think they can.

the phrases they use are well understood in the SE business.  that's 
good enough for everyone involved (including courts; see below).

 (What on earth is meta-searching? If you're going to use terms which
 don't have a commonly understood meaning, define what they mean.)

http://en.wikipedia.org/wiki/Metasearch_engine

 If I want to search for foo, and I type foo into the Firefox search
 box, is that an automated query?

nope.  unless you're a robot.

 What if I type gg: foo into Konqueror's address bar, which expands to
 http://www.google.com/search?q=foo;? Is it okay if I type the URL by hand
 myself?

nope.  unless you're a robot.

 Can I use the browser to save the search page to a local HTML file? If
 Google says no, how can they possibly hope to stop me?

what you do with the search results once you've gotten them is outside 
the scope of that clause.

 What if I type this command into my shell?
 
 elinks --dump http://www.google.com/search?q=foo;  output.html
 
 What if I type
 
 wget http://www.google.com/search?q=foo;
 
 into the shell? Surely that's no more automated than typing foo
 into Google's search box.

neither is automated, unless you're a robot.

 Where is the line I must not cross?

letting a program generate search requests based on something other than 
human wants to find something and types some keywords into a prompt 
somewhere.

 And that, it seems to me, is what the Original Poster wanted.

the OP wanted to read keywords from a text file generated in some 
unknown fashion.  that's bot behaviour, not human behaviour.

 Of course, what I think isn't important. If Google wants to write legal
 contracts that won't stand up in court (speaking as somebody who isn't a
 lawyer and whose legal advice is worthless)

well, here's some random guy who didn't understand the terms used in 
the contract isn't a valid defense in court; courts are more interested 
in whether people with experience from the relevant field can reasonably 
be expected to understand the contract.  but this isn't about court 
cases, of course; it's about getting banned by Google for abusing their 
services.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread Steve Holden
Steven D'Aprano wrote:
 On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote:
 
 
http://www.google.com/terms_of_service.html

You may not send automated queries of any sort to Google's system 
 without express
permission in advance from Google.
 
 
 I'm not just being a pedantic weasel here, but what's an automated query?
 Google's ToS is a legal document (maybe), and if both parties don't agree
 on the meanings of terms, well, then it is a lousy legal document and a
 recipe for trouble.
 
 Google don't define automated queryit, and I don't think they can. In
 fact, the closest they come to defining it is to list three things they
 want to prevent, NONE of which have anything to do with the distinction
 between automated and non-automated.
 

The fact remains that Google can chop your searching ability off at the 
knees if *they* determine that you have broken the terms of service, so 
whether you agree or not becomes slightly academic.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread altemurbugra
GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread Diez B. Roggisch
[EMAIL PROTECTED] wrote:

 GOOGLE IS NOT OUR SUBJECT ANY MORE.
 
 MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
 MY GOAL IS MAKING A SEARCH ON
 www.onelook.com, for example



Can you send me the list of words in the index? May I extract it from your
site?
 No, sorry. If you're thinking about writing a script to systematically copy
OneLook.com's word list, please don't. It's not yours to copy, for one
thing. But also, it wastes tremendous bandwidth and slows things down for
other users. We have software in place to detect the abuse of our service
and we'll alert your ISP if you violate our trust in you. If you're looking
for a decent-sized downloadable word list, try WordNet, which offers that
and much more. If you're working on a project for school or academic
research, let us know and we might be able to help steer you in the right
direction. 


Consider this: if you'd offered the courtesy of a occasional lemonade for
you neighbours, does that mean that you like them stomping around in your
kitchen?

Nearly all of sites that offer a service like this will have policies of
that kind. So - get a grip, stop shouting, and start thinking if what you
are trying to do is legal or social. If not, and you don't care - be my
guest, but don't ask for help here!

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

 GOOGLE IS NOT OUR SUBJECT ANY MORE.
 
 MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
 MY GOAL IS MAKING A SEARCH ON
 www.onelook.com, for example

this is usenet; you don't own the threads you start.  if there's a 
subthread that you don't find relevant to your original question, just 
ignore it.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread altemurbugra
I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread George Sakkis
[EMAIL PROTECTED] wrote:

 I dont mean google
 i dont mean onelook.com

 these are only examples

 i hop eyou understand what i mean

Apparently, *you* don't understand what they're trying to tell you. It
roughly boils down to the following:

- All (except perhaps the most trivial small) sites disallow in their
Terms of Service the unregulated harvesting of their content by
webbots, both for legal and technical reasons. It's not just Google or
Onelook that does this.
- Yes, it is technically possible to attempt to violate their ToS,
running their risk to be caught (with whatever consequences this
implies).
- Yes, you *might* be able to get away with it (at least for some time)
running in stealth mode.
- No, people here are not willing to help you go down this road, you're
on your own. 

Hope this helps,
George

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-26 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Steve
Holden wrote:

 The fact remains that Google can chop your searching ability off at the
 knees ...

No they can't. They can only chop off your ability to use Google.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

 is it possible to make search on for example on google without api with
 a list of words
 1- there is word list
 2- the script will take the words from the list by turn
 3-it iwll make the search
 4-will get results
 5-will write the results as xml file.

http://www.google.com/terms_of_service.html

You may not send automated queries of any sort to Google's system without 
express
permission in advance from Google.

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread altemurbugra

I dont mean only google, also other sites aswell

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread Adam Jones

[EMAIL PROTECTED] wrote:
 I dont mean only google, also other sites aswell

Google expressly forbids doing any form of automated search outside of
their api. If you want to write a script that will run Google searches,
you have to use the api to do so. As far as I know most of the other
search sites have the same requirement.

Yes, it is possible to query a bunch of search sites and dump the
results into an xml file. It is not even all that hard. In fact, I bet
running a search on the relevant terms will probably produce something
that almost does what you want.

-Adam

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread altemurbugra
Thank you very much for your explications. I dont mean a search engine.
for example a dictionary site for searching words.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread altemurbugra
For example i give you an example about making search on one of the
site and get the result.

# #!/usr/bin/python
# # -*- coding: windows-1254; -*-
#
# import urllib
# dictionary = {}# wow, it's actually a dictionary
# words = ['apple', 'banana', 'cheese']
# for word in words:
# dictionary[word] =
urllib.urlopen(http://www.example.com/look.php?w=; + word).read()
#
# print dictionary

i dont know how i can get the words from a txt file for searching by
turn

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread altemurbugra

And also writing the result as a html or xml file

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread Steven D'Aprano
On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote:

 http://www.google.com/terms_of_service.html
 
 You may not send automated queries of any sort to Google's system 
 without express
 permission in advance from Google.

I'm not just being a pedantic weasel here, but what's an automated query?
Google's ToS is a legal document (maybe), and if both parties don't agree
on the meanings of terms, well, then it is a lousy legal document and a
recipe for trouble.

Google don't define automated queryit, and I don't think they can. In
fact, the closest they come to defining it is to list three things they
want to prevent, NONE of which have anything to do with the distinction
between automated and non-automated.

(What on earth is meta-searching? If you're going to use terms which
don't have a commonly understood meaning, define what they mean.)

If I want to search for foo, and I type foo into the Firefox search
box, is that an automated query?

What if I type gg: foo into Konqueror's address bar, which expands to
http://www.google.com/search?q=foo;? Is it okay if I type the URL by hand
myself?

Can I use the browser to save the search page to a local HTML file? If
Google says no, how can they possibly hope to stop me?

What if I type this command into my shell?

elinks --dump http://www.google.com/search?q=foo;  output.html

What if I type

wget http://www.google.com/search?q=foo;

into the shell? Surely that's no more automated than typing foo
into Google's search box. (wget doesn't in fact work, as Google recognises
its user-agent string and blocks it, EVEN in cases where I am using wget
manually. What, can't Google themselves tell the difference between
automatic and non-automatic searching?)

Where is the line I must not cross?

The thing is, Google doesn't want people reselling their services, and I
respect Google's intention. But trying to draw a distinction between
automated and non-automated requests is difficult if not impossible,
as can be seen by the heavy-handed way Google blocks the manual use of
wget. I don't condone the gross abuse of Google's service, but I don't
think an artificial distinction between automated and non-automated is a
useful way to go about it.

Of course, what I think isn't important. If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless), they can. But the point is, I
see no ethical nor legal reason why a user can't create a script which is
called MANUALLY by the user and does what a browser does, namely send and
receive data from websites (which may or may not include Google). 

And that, it seems to me, is what the Original Poster wanted.



-- 
Steven D'Aprano 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Makin search on the other site and getting data and writing in xml

2006-09-25 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

 i dont know how i can get the words from a txt file for searching by
 turn

checking the reading and writing files section in the tutorial might 
be somewhat helpful:

 http://docs.python.org/tut/node9.html#SECTION00920

/F

-- 
http://mail.python.org/mailman/listinfo/python-list