Re: Makin search on the other site and getting data and writing in xml
In message [EMAIL PROTECTED], Paul Boddie wrote: Various sites forbid wget and friends as a rule, understandably ... No, that is not understandable. -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Steve Holden wrote: The fact remains that Google can chop your searching ability off at the knees ... No they can't. They can only chop off your ability to use Google. [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Steve Holden [EMAIL PROTECTED] writes: Lawrence D'Oliveiro wrote: Steve Holden wrote: The fact remains that Google can chop your searching ability off at the knees ... No they can't. They can only chop off your ability to use Google. [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you. Seems like a fairly important distinction. Google has the power to chop your searching ability off at the knees only to the extent that you grant them that power. -- \ [...] a Microsoft Certified System Engineer is to information | `\ technology as a McDonalds Certified Food Specialist is to the | _o__)culinary arts. -- Michael Bacarella | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
In message [EMAIL PROTECTED], Ben Finney wrote: Steve Holden [EMAIL PROTECTED] writes: Lawrence D'Oliveiro wrote: Steve Holden wrote: The fact remains that Google can chop your searching ability off at the knees ... No they can't. They can only chop off your ability to use Google. [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you. Seems like a fairly important distinction. Google has the power to chop your searching ability off at the knees only to the extent that you grant them that power. Saying search when you mean Google is like saying using a PC when you mean using Microsoft Windows. -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Ben Finney wrote: Steve Holden [EMAIL PROTECTED] writes: Lawrence D'Oliveiro wrote: Steve Holden wrote: The fact remains that Google can chop your searching ability off at the knees ... No they can't. They can only chop off your ability to use Google. [sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you. Seems like a fairly important distinction. Google has the power to chop your searching ability off at the knees only to the extent that you grant them that power. Saying search when you mean Google is like saying using a PC when you mean using Microsoft Windows. Well, I thought it was self-evident that since I was referring to Google I wasn't talking about Alta Vista searching. If I said Microsoft have the ability to terminate your license presumably you'd chastise me by pointing out that they wouldn't be able to revoke my *Linux* license. Whatever. There's none as thick as them that wants to be. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
ok i close this discussion i understand everybody no problem -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: ok i close this discussion No, you don't. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
George Sakkis wrote: [EMAIL PROTECTED] wrote: I dont mean google i dont mean onelook.com these are only examples i hop eyou understand what i mean Apparently, *you* don't understand what they're trying to tell you. It roughly boils down to the following: If we just step back from the brink for a moment and give the questioner the benefit of the doubt - that the exercise merely involves automating some kind of interactions that would otherwise require lots of manual messing around piloting a browser, rather than performing some kind of bulk suck down of an entire site's information - then it is obviously possible to use the following techniques: * Use a well-known mirroring or archiving tool such as wget. * Use various testing tools, some of which are written in Python. * Use urllib, urllib2 or httplib plus an HTML or XML parser in your own program. * Automate a Web browser using some off-the-shelf program. * Use various automation mechanisms provided by your environment (eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1], KPart Plugins [2]). Various sites forbid wget and friends as a rule, understandably, but there are sometimes reasons why you might want to use various tools to automate a procedure involving lots of data which would waste a huge amount of time if done manually. Perhaps you might have mail residing in a Webmail system which can't be extracted via any process other than reading all the messages in a browser, for example, or perhaps your favourite Internet applications don't provide decent shortcuts to the information you need, instead believing that it's all about the experience: surfing around watching all the animated adverts. Automation and related technologies can legitimately help users regain control of their Internet-resident data and make better use of the services around it. Paul [1] http://pamie.sourceforge.net/ [2] http://www.boddie.org.uk/python/kpartplugins.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Steven D'Aprano wrote: Google don't define automated queryit, and I don't think they can. the phrases they use are well understood in the SE business. that's good enough for everyone involved (including courts; see below). (What on earth is meta-searching? If you're going to use terms which don't have a commonly understood meaning, define what they mean.) http://en.wikipedia.org/wiki/Metasearch_engine If I want to search for foo, and I type foo into the Firefox search box, is that an automated query? nope. unless you're a robot. What if I type gg: foo into Konqueror's address bar, which expands to http://www.google.com/search?q=foo;? Is it okay if I type the URL by hand myself? nope. unless you're a robot. Can I use the browser to save the search page to a local HTML file? If Google says no, how can they possibly hope to stop me? what you do with the search results once you've gotten them is outside the scope of that clause. What if I type this command into my shell? elinks --dump http://www.google.com/search?q=foo; output.html What if I type wget http://www.google.com/search?q=foo; into the shell? Surely that's no more automated than typing foo into Google's search box. neither is automated, unless you're a robot. Where is the line I must not cross? letting a program generate search requests based on something other than human wants to find something and types some keywords into a prompt somewhere. And that, it seems to me, is what the Original Poster wanted. the OP wanted to read keywords from a text file generated in some unknown fashion. that's bot behaviour, not human behaviour. Of course, what I think isn't important. If Google wants to write legal contracts that won't stand up in court (speaking as somebody who isn't a lawyer and whose legal advice is worthless) well, here's some random guy who didn't understand the terms used in the contract isn't a valid defense in court; courts are more interested in whether people with experience from the relevant field can reasonably be expected to understand the contract. but this isn't about court cases, of course; it's about getting banned by Google for abusing their services. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Steven D'Aprano wrote: On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote: http://www.google.com/terms_of_service.html You may not send automated queries of any sort to Google's system without express permission in advance from Google. I'm not just being a pedantic weasel here, but what's an automated query? Google's ToS is a legal document (maybe), and if both parties don't agree on the meanings of terms, well, then it is a lousy legal document and a recipe for trouble. Google don't define automated queryit, and I don't think they can. In fact, the closest they come to defining it is to list three things they want to prevent, NONE of which have anything to do with the distinction between automated and non-automated. The fact remains that Google can chop your searching ability off at the knees if *they* determine that you have broken the terms of service, so whether you agree or not becomes slightly academic. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
GOOGLE IS NOT OUR SUBJECT ANY MORE. MY GOAL IS NOT MAKING SEARCH ON GOOGLE: MY GOAL IS MAKING A SEARCH ON www.onelook.com, for example -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: GOOGLE IS NOT OUR SUBJECT ANY MORE. MY GOAL IS NOT MAKING SEARCH ON GOOGLE: MY GOAL IS MAKING A SEARCH ON www.onelook.com, for example Can you send me the list of words in the index? May I extract it from your site? No, sorry. If you're thinking about writing a script to systematically copy OneLook.com's word list, please don't. It's not yours to copy, for one thing. But also, it wastes tremendous bandwidth and slows things down for other users. We have software in place to detect the abuse of our service and we'll alert your ISP if you violate our trust in you. If you're looking for a decent-sized downloadable word list, try WordNet, which offers that and much more. If you're working on a project for school or academic research, let us know and we might be able to help steer you in the right direction. Consider this: if you'd offered the courtesy of a occasional lemonade for you neighbours, does that mean that you like them stomping around in your kitchen? Nearly all of sites that offer a service like this will have policies of that kind. So - get a grip, stop shouting, and start thinking if what you are trying to do is legal or social. If not, and you don't care - be my guest, but don't ask for help here! Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: GOOGLE IS NOT OUR SUBJECT ANY MORE. MY GOAL IS NOT MAKING SEARCH ON GOOGLE: MY GOAL IS MAKING A SEARCH ON www.onelook.com, for example this is usenet; you don't own the threads you start. if there's a subthread that you don't find relevant to your original question, just ignore it. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
I dont mean google i dont mean onelook.com these are only examples i hop eyou understand what i mean -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: I dont mean google i dont mean onelook.com these are only examples i hop eyou understand what i mean Apparently, *you* don't understand what they're trying to tell you. It roughly boils down to the following: - All (except perhaps the most trivial small) sites disallow in their Terms of Service the unregulated harvesting of their content by webbots, both for legal and technical reasons. It's not just Google or Onelook that does this. - Yes, it is technically possible to attempt to violate their ToS, running their risk to be caught (with whatever consequences this implies). - Yes, you *might* be able to get away with it (at least for some time) running in stealth mode. - No, people here are not willing to help you go down this road, you're on your own. Hope this helps, George -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
In message [EMAIL PROTECTED], Steve Holden wrote: The fact remains that Google can chop your searching ability off at the knees ... No they can't. They can only chop off your ability to use Google. -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: is it possible to make search on for example on google without api with a list of words 1- there is word list 2- the script will take the words from the list by turn 3-it iwll make the search 4-will get results 5-will write the results as xml file. http://www.google.com/terms_of_service.html You may not send automated queries of any sort to Google's system without express permission in advance from Google. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
I dont mean only google, also other sites aswell -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: I dont mean only google, also other sites aswell Google expressly forbids doing any form of automated search outside of their api. If you want to write a script that will run Google searches, you have to use the api to do so. As far as I know most of the other search sites have the same requirement. Yes, it is possible to query a bunch of search sites and dump the results into an xml file. It is not even all that hard. In fact, I bet running a search on the relevant terms will probably produce something that almost does what you want. -Adam -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
Thank you very much for your explications. I dont mean a search engine. for example a dictionary site for searching words. -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
For example i give you an example about making search on one of the site and get the result. # #!/usr/bin/python # # -*- coding: windows-1254; -*- # # import urllib # dictionary = {}# wow, it's actually a dictionary # words = ['apple', 'banana', 'cheese'] # for word in words: # dictionary[word] = urllib.urlopen(http://www.example.com/look.php?w=; + word).read() # # print dictionary i dont know how i can get the words from a txt file for searching by turn -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
And also writing the result as a html or xml file -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote: http://www.google.com/terms_of_service.html You may not send automated queries of any sort to Google's system without express permission in advance from Google. I'm not just being a pedantic weasel here, but what's an automated query? Google's ToS is a legal document (maybe), and if both parties don't agree on the meanings of terms, well, then it is a lousy legal document and a recipe for trouble. Google don't define automated queryit, and I don't think they can. In fact, the closest they come to defining it is to list three things they want to prevent, NONE of which have anything to do with the distinction between automated and non-automated. (What on earth is meta-searching? If you're going to use terms which don't have a commonly understood meaning, define what they mean.) If I want to search for foo, and I type foo into the Firefox search box, is that an automated query? What if I type gg: foo into Konqueror's address bar, which expands to http://www.google.com/search?q=foo;? Is it okay if I type the URL by hand myself? Can I use the browser to save the search page to a local HTML file? If Google says no, how can they possibly hope to stop me? What if I type this command into my shell? elinks --dump http://www.google.com/search?q=foo; output.html What if I type wget http://www.google.com/search?q=foo; into the shell? Surely that's no more automated than typing foo into Google's search box. (wget doesn't in fact work, as Google recognises its user-agent string and blocks it, EVEN in cases where I am using wget manually. What, can't Google themselves tell the difference between automatic and non-automatic searching?) Where is the line I must not cross? The thing is, Google doesn't want people reselling their services, and I respect Google's intention. But trying to draw a distinction between automated and non-automated requests is difficult if not impossible, as can be seen by the heavy-handed way Google blocks the manual use of wget. I don't condone the gross abuse of Google's service, but I don't think an artificial distinction between automated and non-automated is a useful way to go about it. Of course, what I think isn't important. If Google wants to write legal contracts that won't stand up in court (speaking as somebody who isn't a lawyer and whose legal advice is worthless), they can. But the point is, I see no ethical nor legal reason why a user can't create a script which is called MANUALLY by the user and does what a browser does, namely send and receive data from websites (which may or may not include Google). And that, it seems to me, is what the Original Poster wanted. -- Steven D'Aprano -- http://mail.python.org/mailman/listinfo/python-list
Re: Makin search on the other site and getting data and writing in xml
[EMAIL PROTECTED] wrote: i dont know how i can get the words from a txt file for searching by turn checking the reading and writing files section in the tutorial might be somewhat helpful: http://docs.python.org/tut/node9.html#SECTION00920 /F -- http://mail.python.org/mailman/listinfo/python-list