Re: [Tutor] urllib ... lost novice's question
On 10/05/17 17:06, Rafael Knuth wrote: >>> Then, there is another package, along with a dozen other >>> urllib-related packages (such as aiourllib). >> >> Again, where are you finding these? They are not in >> the standard library. Have you been installing other >> packages that may have their own versions maybe? > > they are all available via PyCharm EDU It looks like PyCharm may be adding extra packages to the standard library. Thats OK, both ActiveState and Anaconda (and others) do the same, but it does mean you need to check on python.org to see what is and what isn't "approved". If it's not official content then you need to ask on a PyCharm forum about the preferred choices. The fact they are included suggests that somebody has tested them and found them useful in some way, but you would need to ask them why they chose those packages and when they would be more suitable than the standard versions. These bonus packages are often seen as a valuable extra, but they do carry a burden of responsibility for the user to identify which is best for them, and that's not always easy to assess, especially for a beginner. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib ... lost novice's question
>> Then, there is another package, along with a dozen other >> urllib-related packages (such as aiourllib). > > Again, where are you finding these? They are not in > the standard library. Have you been installing other > packages that may have their own versions maybe? they are all available via PyCharm EDU ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib ... lost novice's question
this is one of those things where if what you want is simple, they're all usable, and easy. if not, some are frankly horrid. requests is the current hot module. go ahead and try it. (urllib.request is not from requests, it's from urllib) On May 8, 2017 9:23:15 AM MDT, Rafael Knuthwrote: >Which package should I use to fetch and open an URL? >I am using Python 3.5 and there are presently 4 versions: > >urllib2 >urllib3 >urllib4 >urllib5 > >Common sense is telling me to use the latest version. >Not sure if my common sense is fooling me here though ;-) > >Then, there is another package, along with a dozen other >urllib-related packages (such as aiourllib). I thought this one is >doing what I need: > >urllib.request > >The latter I found on http://docs.python-requests.org along with these >encouraging words: > >"Warning: Recreational use of the Python standard library for HTTP may >result in dangerous side-effects, including: security vulnerabilities, >verbose code, reinventing the wheel, constantly reading documentation, >depression, headaches, or even death." > >How do I know where to find the right package - on python.org or >elsewhere? >I found some code samples that show how to use urllib.request, now I >am trying to understand why I should use urllib.request. >Would it be also doable to do requests using urllib5 or any other >version? Like 2 or 3? Just trying to understand. > >I am lost here. Feeback appreciated. Thank you! > >BTW, here's some (working) exemplary code I have been using for >educational purposes: > >import urllib.request >from bs4 import BeautifulSoup > >theurl = "https://twitter.com/rafaelknuth; >thepage = urllib.request.urlopen(theurl) >soup = BeautifulSoup(thepage, "html.parser") > >print(soup.title.text) > >i = 1 >for tweets in soup.findAll("div",{"class":"content"}): >print(i) >print(tweets.find("p").text) >i = i + 1 > >I am assuming there are different solutions for fetching and open URLs? >Or is the above the only viable solution? >___ >Tutor maillist - Tutor@python.org >To unsubscribe or change subscription options: >https://mail.python.org/mailman/listinfo/tutor -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib ... lost novice's question
As a side note see a tutorial on urllib and requests and try them at the same time see for python 3.x; 3.4 or 3.6 also see the data type received by the different combinations, when you should use .read() etc also use utf-8 or unicode like .decode("utf8") Well play around fool mess with it, feel free as when you'll do serious stuffs you won't need to test to know what should be done or not, what breaks it or not. summary : learn it well from the begining Finding the right package. Hum either in your beginner learning path you learn popular third party modules or You find how the people round the net did what you are doing, see how they did it and what modules they used or google "module " or browse pypi or _long term_ never stop reading about python. so you'll constantly discover new things and reduce the probability of you not knowing how to do something. Hope it helps, Abdur-Rahmaan Janhangeer Vacoas, Mauritius https://abdurrahmaanjanhangeer.wordpress.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib ... lost novice's question
On 08/05/17 16:23, Rafael Knuth wrote: > Which package should I use to fetch and open an URL? > I am using Python 3.5 and there are presently 4 versions: > > urllib2 > urllib3 > urllib4 > urllib5 I don't know where you are getting those from but the standard install of Python v3.6 only has urllib. This is a package with various modules inside. ISTR there was a urllib2 in Python 2 for a while but I've never heard of any 3,4, or 5. > Then, there is another package, along with a dozen other > urllib-related packages (such as aiourllib). Again, where are you finding these? They are not in the standard library. Have you been installing other packages that may have their own versions maybe? > urllib.request > > The latter I found on http://docs.python-requests.org along with these > encouraging words: > > "Warning: Recreational use of the Python standard library for HTTP may > result in dangerous side-effects, including: security vulnerabilities, > verbose code, reinventing the wheel, constantly reading documentation, > depression, headaches, or even death." That's true of almost any package used badly. Remember that this is "marketing" propaganda from an alternative package maintainer. And while most folks (including me)seem to agree that Requests is easier to use than the standard library, the standard library version works just fine if you take sensible care. > How do I know where to find the right package There is no right package, just the one you find most effective. Most folks would say that Requests is easier to use than the standard library, if you are doing anything non-trivial I'd second that opinion. > I found some code samples that show how to use urllib.request, now I > am trying to understand why I should use urllib.request. Because as part of the standard library you can be sure it will be thee, whereas Requests is a third party module that needs to be downloaded/installed and therefore may not be present (or even allowed by the server admins) Or maybe because you found some old code written before Requests became popular and you need to integrate with it or reuse it. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] urllib ... lost novice's question
Which package should I use to fetch and open an URL? I am using Python 3.5 and there are presently 4 versions: urllib2 urllib3 urllib4 urllib5 Common sense is telling me to use the latest version. Not sure if my common sense is fooling me here though ;-) Then, there is another package, along with a dozen other urllib-related packages (such as aiourllib). I thought this one is doing what I need: urllib.request The latter I found on http://docs.python-requests.org along with these encouraging words: "Warning: Recreational use of the Python standard library for HTTP may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death." How do I know where to find the right package - on python.org or elsewhere? I found some code samples that show how to use urllib.request, now I am trying to understand why I should use urllib.request. Would it be also doable to do requests using urllib5 or any other version? Like 2 or 3? Just trying to understand. I am lost here. Feeback appreciated. Thank you! BTW, here's some (working) exemplary code I have been using for educational purposes: import urllib.request from bs4 import BeautifulSoup theurl = "https://twitter.com/rafaelknuth; thepage = urllib.request.urlopen(theurl) soup = BeautifulSoup(thepage, "html.parser") print(soup.title.text) i = 1 for tweets in soup.findAll("div",{"class":"content"}): print(i) print(tweets.find("p").text) i = i + 1 I am assuming there are different solutions for fetching and open URLs? Or is the above the only viable solution? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib confusion
On 21Nov2014 15:57, Clayton Kirkwood c...@godblessthe.us wrote: Got a general problem with url work. I’ve struggled through a lot of code which uses urllib.[parse,request]* and urllib2. First q: I read someplace in urllib documentation which makes it sound like either urllib or urllib2 modules are being deprecated in 3.5. Don’t know if it’s only part or whole. The names of the modules changed I believe in v3.x. I don't think so because I've seen both lib and lib2 in both new and old code, and current 4.3 documentation talks only of urllib. You mean 3.4 I would hope. It is clear from this: https://docs.python.org/3/py-modindex.html#cap-u that there is no urllib2 in Python 3, just urllib. I recommend you read this: https://docs.python.org/3/whatsnew/3.0.html which is a very useful overview of the main changes which came with Python 3, and covers almost all the structural changes (such as module renames); the 3.0 release was the Big Change. But you can save yourself a lot of trouble by using the excelent 3rd party package called requests: http://docs.python-requests.org/en/latest/ I've seen nothing of this. You have now. It is very popular and widely liked. Cheers, Cameron Simpson c...@zip.com.au 'Supposing a tree fell down, Pooh, when we were underneath it?' 'Supposing it didn't,' said Pooh after careful thought. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib confusion
On Fri, Nov 21, 2014 at 01:37:45PM -0800, Clayton Kirkwood wrote: Got a general problem with url work. I've struggled through a lot of code which uses urllib.[parse,request]* and urllib2. First q: I read someplace in urllib documentation which makes it sound like either urllib or urllib2 modules are being deprecated in 3.5. Don't know if it's only part or whole. Can you point us to this place? I would be shocked and rather dismayed to hear that urllib(2) was being deprecated, but it is possible that one small component is being renamed/moved/deprecated. I've read through a lot that says that urllib..urlopen needs urlencode, and/or encode('utf-8') for byte conversion, but I've seen plenty of examples where nothing is being encoded either way. I also have a sneeking suspicious that urllib2 code does all of the encoding. I've read that if things aren't encoded that I will get TypeError, yet I've seen plenty of examples where there is no error and no encoding. It's hard to comment and things you've read when we don't know what they are or precisely what they say. I read that... is the equivalent of a man down the pub told me If the examples are all ASCII, then no charset encoding is needed, although urlencode will still perform percent-encoding: py from urllib.parse import urlencode py urlencode({key: value}) 'key=%3Cvalue%3E' The characters '' and '' are not legal inside URLs, so they have to be encoded as '%3C' and '%3E'. Because all the characters are ASCII, the result remains untouched. Non-ASCII characters, on the other hand, are encoded into UTF-8 by default, although you can pick another encoding and/or error handler: py urlencode({key: © 2014}) 'key=%C2%A9+2014' The copyright symbol © encoded into UTF-8 is the two bytes \xC2\xA9 which are then percent encoded into %C2%A9. Why do so many examples seem to not encode? And not get TypeError? And yes, for those of you who are about to suggest it, I have tried a lot of things and read for many hours. One actual example is worth about a thousand vague descriptions. But in general, I would expect that the urllib functions default to using UTF-8 as the encoding, so you don't have to manually specify an encoding, it just works. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] urllib confusion
Hi all. Got a general problem with url work. I've struggled through a lot of code which uses urllib.[parse,request]* and urllib2. First q: I read someplace in urllib documentation which makes it sound like either urllib or urllib2 modules are being deprecated in 3.5. Don't know if it's only part or whole. I've read through a lot that says that urllib..urlopen needs urlencode, and/or encode('utf-8') for byte conversion, but I've seen plenty of examples where nothing is being encoded either way. I also have a sneeking suspicious that urllib2 code does all of the encoding. I've read that if things aren't encoded that I will get TypeError, yet I've seen plenty of examples where there is no error and no encoding. Why do so many examples seem to not encode? And not get TypeError? And yes, for those of you who are about to suggest it, I have tried a lot of things and read for many hours. Thanks, Clayton You can tell the caliber of a man by his gun--c. kirkwood ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib confusion
On Fri, Nov 21, 2014 at 4:37 PM, Clayton Kirkwood c...@godblessthe.us wrote: Hi all. Got a general problem with url work. I’ve struggled through a lot of code which uses urllib.[parse,request]* and urllib2. First q: I read someplace in urllib documentation which makes it sound like either urllib or urllib2 modules are being deprecated in 3.5. Don’t know if it’s only part or whole. The names of the modules changed I believe in v3.x. But you can save yourself a lot of trouble by using the excelent 3rd party package called requests: http://docs.python-requests.org/en/latest/ Also, please use plaintext for your questions. That way everyone can read them, and the indentation won't get mangled I’ve read through a lot that says that urllib..urlopen needs urlencode, and/or encode(‘utf-8’) for byte conversion, but I’ve seen plenty of examples where nothing is being encoded either way. I also have a sneeking suspicious that urllib2 code does all of the encoding. I’ve read that if things aren’t encoded that I will get TypeError, yet I’ve seen plenty of examples where there is no error and no encoding. Why do so many examples seem to not encode? And not get TypeError? And yes, for those of you who are about to suggest it, I have tried a lot of things and read for many hours. Thanks, Clayton You can tell the caliber of a man by his gun--c. kirkwood ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick http://joelgoldstick.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib confusion
On 21/11/14 21:37, Clayton Kirkwood wrote: urllib or urllib2 modules are being deprecated in 3.5. Don’t know if it’s only part or whole. urlib2 doesn't exist in Python3 there is only the urllib package. As to urllib being deprecated, thats the first I've heard of it but it may be the case - I don;t follow the new releases closely since I'm usually at least 2 releases behind. I only upgraded to 3.4 because I was writing the new book and needed it to be as current as possible. But the What's New document for the 3.5 alpha says: A new urllib.request.HTTPBasicPriorAuthHandler allows HTTP Basic Authentication credentials to be sent unconditionally with the first HTTP request, rather than waiting for a HTTP 401 Unauthorized response from the server. (Contributed by Matej Cepl in issue 19494.) And the NEWS file adds: urllib.request.urlopen will accept a context object (SSLContext) as an argument which will then used be for HTTPS connection. Patch by Alex Gaynor. Which suggests urllib is alive and kicking... I’ve read through a lot that says that urllib..urlopen needs urlencode, and/or encode(‘utf-8’) for byte conversion, but I’ve seen plenty of examples where nothing is being encoded either way. Might those be v2 examples? encoding got a whole lot more specific in Python v3. But I'm not sure what you mean by the double dot. urllib.urlopen is discontinued in Python3. You should be using urllib.request.urlopen instead. (But maybe thats what you meant by the ..?) Why do so many examples seem to not encode? And not get TypeError? Without specific examples it's hard to know. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my phopto-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib confusion
-Original Message- From: Joel Goldstick [mailto:joel.goldst...@gmail.com] Sent: Friday, November 21, 2014 2:39 PM To: Clayton Kirkwood Cc: tutor@python.org Subject: Re: [Tutor] urllib confusion On Fri, Nov 21, 2014 at 4:37 PM, Clayton Kirkwood c...@godblessthe.us wrote: Hi all. Got a general problem with url work. I’ve struggled through a lot of code which uses urllib.[parse,request]* and urllib2. First q: I read someplace in urllib documentation which makes it sound like either urllib or urllib2 modules are being deprecated in 3.5. Don’t know if it’s only part or whole. The names of the modules changed I believe in v3.x. I don't think so because I've seen both lib and lib2 in both new and old code, and current 4.3 documentation talks only of urllib. But you can save yourself a lot of trouble by using the excelent 3rd party package called requests: http://docs.python-requests.org/en/latest/ I've seen nothing of this. Also, please use plaintext for your questions. That way everyone can read them, and the indentation won't get mangled I’ve read through a lot that says that urllib..urlopen needs urlencode, and/or encode(‘utf-8’) for byte conversion, but I’ve seen plenty of examples where nothing is being encoded either way. I also have a sneeking suspicious that urllib2 code does all of the encoding. I’ve read that if things aren’t encoded that I will get TypeError, yet I’ve seen plenty of examples where there is no error and no encoding. Why do so many examples seem to not encode? And not get TypeError? And yes, for those of you who are about to suggest it, I have tried a lot of things and read for many hours. Thanks, Clayton You can tell the caliber of a man by his gun--c. kirkwood ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick http://joelgoldstick.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Urllib Problem
I am trying to make a simple programm with Python 3,that tries to open differnet pages from a wordlist and prints which are alive.Here is the code: from urllib import request fob=open('c:/passwords/pass.txt','r') x = fob.readlines() for i in x: urllib.request.openurl('www.google.gr/' + i) But it doesent work.Whats the problem? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib Problem
On 07/29/2011 11:52 AM, George Anonymous wrote: I am trying to make a simple programm with Python 3,that tries to open differnet pages from a wordlist and prints which are alive.Here is the code: from urllib import request fob=open('c:/passwords/pass.txt','r') x = fob.readlines() for i in x: urllib.request.openurl('www.google.gr/ http://www.google.gr/' + i) But it doesent work.Whats the problem? Please give the exception error you get?! And you should have in the html header the html code error number which gives you the fail answer from the server. Cheers Karim ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib Problem
On Fri, Jul 29, 2011 at 5:58 AM, Karim karim.liat...@free.fr wrote: ** On 07/29/2011 11:52 AM, George Anonymous wrote: I am trying to make a simple programm with Python 3,that tries to open differnet pages from a wordlist and prints which are alive.Here is the code: from urllib import request fob=open('c:/passwords/pass.txt','r') x = fob.readlines() for i in x: urllib.request.openurl('www.google.gr/' + i) But it doesent work.Whats the problem? Please give the exception error you get?! And you should have in the html header the html code error number which gives you the fail answer from the server. Cheers Karim As Karim noted you'll want to mention any exceptions you are getting. I'm not sure what it is you are trying to do with your code. If you'd like to try to open each line and try something if it works else an exception the code may read something similar to: fob = open('C:/passwords/pass.txt','r') fob_rlines = fob.readlines() for line in fob_rlines: try: #whatever it is you would like to do with each line except Exception: #where code didn't work and an exception occured #whatever you would like to do when a particular *Exception* occurs Hope that helps, Alexander ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib Problem
George Anonymous wrote: I am trying to make a simple programm with Python 3,that tries to open differnet pages from a wordlist and prints which are alive.Here is the code: from urllib import request fob=open('c:/passwords/pass.txt','r') x = fob.readlines() for i in x: urllib.request.openurl('www.google.gr/' + i) But it doesent work.Whats the problem? A guessing game! I LOVE guessing games!!! :) Let's seen let me guess what you mean by doesn't work: - the computer locks up and sits there until you hit the restart switch - the computer gives a Blue Screen Of Death - Python raises an exception - Python downloads the Yahoo website instead of Google - something else My guess is... you're getting a NameError exception, like this one: from urllib import request x = urllib.request.openurl('www.google.com') Traceback (most recent call last): File stdin, line 1, in module NameError: name 'urllib' is not defined Am I close? You need to use request.urlopen, not urllib.request.openurl. That's your *first* problem. There are more. Come back if you need help with the others, and next time, don't make us play guessing games. Show us the code you use -- copy and paste it, don't retype it from memory -- what you expect should happen, and what actually happens instead. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] urllib problem
Hoi, I have this programm : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6;) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. What do I have done wrong ? Roelof ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
I have this program : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6;) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. I think number will change; *unless* you happen to retrieve the same number every time, even when you access a different url. What is the result when you run this program, ie, the output of your print statements (then, also, print url)? And, how can url change, but volgende not? Since url depends on volgende. Btw, it may be better to use parentheses in your regular expression to explicitly group whatever you want to match, though the above will work (since it groups the whole match). But Python has this Explicit is better than implicit thing. Cheers, Evert ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote: Hoi, I have this programm : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.php? nothing=6) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. What do I have done wrong ? Each time through the loop, you set volgende to the same result: nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) Since inhoud never changes, and the search never changes, the search result never changes, and volgende never changes. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote: On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote: Hoi, I have this programm : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.ph p? nothing=6) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. What do I have done wrong ? Each time through the loop, you set volgende to the same result: nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) Since inhoud never changes, and the search never changes, the search result never changes, and volgende never changes. Wait, sorry, inhoud should change... I missed the line inhoud = f.read() My mistake, sorry about that. However, I can now see what is going wrong. Your regular expression only looks for a single digit: re.search('[0-9]', inhoud) If you want any number of digits, you need '[0-9]+' instead. Starting from the first URL: f = urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6;) inhoud = f.read() f.close() print inhoud and the next nothing is 87599 but: nummer = re.search('[0-9]', inhoud) nummer.group() '8' See, you only get the first digit. Then looking up the page with nothing=8 gives a first digit starting with 5, and then you get stuck on 5 forever: urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8;).read() 'and the next nothing is 59212' urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5;).read() 'and the next nothing is 51716' You need to add a + to the regular expression, which means one or more digits instead of a single digit. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
From: st...@pearwood.info To: tutor@python.org Date: Tue, 12 Oct 2010 23:58:03 +1100 Subject: Re: [Tutor] urllib problem On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote: Hoi, I have this programm : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.php? nothing=6) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. What do I have done wrong ? Each time through the loop, you set volgende to the same result: nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) Since inhoud never changes, and the search never changes, the search result never changes, and volgende never changes. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor Hello, Here is the output when I print every step in the beginning : inhoud : and the next nothing is 87599 nummer is 8 volgende is 8 and here is the output in the loop : url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8 inhoud is and the next nothing is 59212 nummer is 5 2ste run: url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5 inhoud is and the next nothing is 51716 nummer is 5 3ste run: url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5 inhoud is and the next nothing is 51716 nummer is 5 4ste run: I see the problem. It only takes the first number of the nothing. So I have to look how to solve that. Roelof ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
From: st...@pearwood.info To: tutor@python.org Date: Wed, 13 Oct 2010 01:51:16 +1100 Subject: Re: [Tutor] urllib problem On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote: On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote: Hoi, I have this programm : import urllib import re f = urllib.urlopen(http://www.pythonchallenge.com/pc/def/linkedlist.ph p? nothing=6) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) teller = 1 while teller = 3 : url = http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=; + str(volgende) f = urllib.urlopen(url) inhoud = f.read() f.close() nummer = re.search('[0-9]', inhoud) print nummer is, nummer.group() volgende = int(nummer.group()) print volgende teller = teller + 1 but now the url changes but volgende not. What do I have done wrong ? Each time through the loop, you set volgende to the same result: nummer = re.search('[0-9]', inhoud) volgende = int(nummer.group()) Since inhoud never changes, and the search never changes, the search result never changes, and volgende never changes. Wait, sorry, inhoud should change... I missed the line inhoud = f.read() My mistake, sorry about that. However, I can now see what is going wrong. Your regular expression only looks for a single digit: re.search('[0-9]', inhoud) If you want any number of digits, you need '[0-9]+' instead. Starting from the first URL: f = urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6;) inhoud = f.read() f.close() print inhoud and the next nothing is 87599 but: nummer = re.search('[0-9]', inhoud) nummer.group() '8' See, you only get the first digit. Then looking up the page with nothing=8 gives a first digit starting with 5, and then you get stuck on 5 forever: urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8;).read() 'and the next nothing is 59212' urllib.urlopen( ... http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5;).read() 'and the next nothing is 51716' You need to add a + to the regular expression, which means one or more digits instead of a single digit. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor Hoi Steven, Finally solved this puzzle. Now the next one of the 33 puzzles. Roelof ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib problem
Roelof Wobben rwob...@hotmail.com wrote Finally solved this puzzle. Now the next one of the 33 puzzles. Don;t be surprised if you get stuck. Python Challenge is quite tricky and is deliberately designed to make you explore parts of the standard library you might not otherwise find. Expect to do a lot of reading in the documebntation. Its really targeted at intermediate rather than novice programmers IMHO. -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib
thanks, Senthil On Mon, Dec 7, 2009 at 11:10 AM, Senthil Kumaran orsent...@gmail.comwrote: On Mon, Dec 07, 2009 at 08:38:24AM +0100, Jojo Mwebaze wrote: I need help on something very small... i am using urllib to write a query and what i want returned is 'FHI=128%2C128 FLO=1%2C1' The way to use urllib.encode is like this: urllib.urlencode({key:value}) 'key=value' urllib.urlencode({key:value,key2:value2}) 'key2=value2key=value' For your purpses, you need to construct the dict this way: urllib.urlencode({FHI:'128,128',FHO:'1,1'}) 'FHO=1%2C1FHI=128%2C128' And if you are to use variables, one way to do it would be: x1,y1,x2,y2 = 1,1,128,128 fhi = str(x2) + ',' + str(y2) fho = str(x1) + ',' + str(y1) urllib.urlencode({FHI:fhi,FHO:fho}) 'FHO=1%2C1FHI=128%2C128' -- Senthil ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
On Mon, Jul 6, 2009 at 5:54 PM, David Kimdavidki...@gmail.com wrote: Hello all, I have two questions I'm hoping someone will have the patience to answer as an act of mercy. I. How to get past a Terms of Service page? I've just started learning python (have never done any programming prior) and am trying to figure out how to open or download a website to scrape data. The only problem is, whenever I try to open the link (via urllib2, for example) I'm after, I end up getting the HTML to a Terms of Service Page (where one has to click an I Agree button) rather than the actual target page. I've seen examples on the web on providing data for forms (typically by finding the name of the form and providing some sort of dictionary to fill in the form fields), but this simple act of getting past I Agree is stumping me. Can anyone save my sanity? As a workaround, I've been using os.popen('curl ' + url ' ' filename) to save the html in a txt file for later processing. I have no idea why curl works and urllib2, for example, doesn't (I use OS X). curl works because it ignores the redirect to the ToS page, and the site is (astoundingly) dumb enough to serve the content with the redirect. You could make urllib2 behave the same way by defining a 302 handler that does nothing. I even tried to use Yahoo Pipes to try and sidestep coding anything altogether, but ended up looking at the same Terms of Service page anyway. Here's the code (tho it's probably not that illuminating since it's basically just opening a url): import urllib2 url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1' #the first of 23 tables html = urllib2.urlopen(url).read() Generally you have to post to the same url as the form, giving the same data the form does. You can inspect the source of the form to figure this out. In this case the form is form method=post action=/products/consent.php input type=hidden value=tiwd/products/derivserv/data_table_i.php name=urltarget/ input type=hidden value=1 name=check_one/ input type=hidden value=tiwdata name=tag/ input type=submit value=I Agree name=acknowledgement/ input type=submit value=Decline name=acknowledgement/ /form You generally need to enable cookie support in urllib2 as well, because the site will use a cookie to flag that you saw the consent form. This tutorial shows how to enable cookies and submit form data: http://personalpages.tds.net/~kent37/kk/00010.html Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
2009/7/7 David Kim davidki...@gmail.com: opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor) urllib2.install_opener(opener) response = urllib2.urlopen(http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1;) print response.read() I suspect I am not understanding something basic about how urllib2 deals with this redirect issue since it seems everything I try gives me the same ToS page. Indeed, you create the opener but then you do not use it. Try the below and it should work. response = opener.open(http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1;) data = response.read() Greets Sander ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
On Tue, Jul 7, 2009 at 1:20 PM, David Kimdavidki...@gmail.com wrote: On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnsonken...@tds.net wrote: curl works because it ignores the redirect to the ToS page, and the site is (astoundingly) dumb enough to serve the content with the redirect. You could make urllib2 behave the same way by defining a 302 handler that does nothing. Many thanks for the redirect pointer! I also found http://diveintopython.org/http_web_services/redirects.html. Is the handler class on this page what you mean by a handler that does nothing? (It looks like it exposes the error code but still follows the redirect). No, all of those examples are handling the redirect. The SmartRedirectHandler just captures additional status. I think you need something like this: class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): return None def http_error_302(self, req, fp, code, msg, headers): return None I guess i'm still a little confused since, if the handler does nothing, won't I still go to the ToS page? No, it is the action of the handler, responding to the redirect request, that causes the ToS page to be fetched. For example, I ran the following code (found at http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect) That is pretty similar to the DiP code... I suspect I am not understanding something basic about how urllib2 deals with this redirect issue since it seems everything I try gives me the same ToS page. Maybe you don't understand how redirect works in general... Generally you have to post to the same url as the form, giving the same data the form does. You can inspect the source of the form to figure this out. In this case the form is form method=post action=/products/consent.php input type=hidden value=tiwd/products/derivserv/data_table_i.php name=urltarget/ input type=hidden value=1 name=check_one/ input type=hidden value=tiwdata name=tag/ input type=submit value=I Agree name=acknowledgement/ input type=submit value=Decline name=acknowledgement/ /form You generally need to enable cookie support in urllib2 as well, because the site will use a cookie to flag that you saw the consent form. This tutorial shows how to enable cookies and submit form data: http://personalpages.tds.net/~kent37/kk/00010.html I have seen the login examples where one provides values for the fields username and password (thanks Kent). Given the form above, however, it's unclear to me how one POSTs the form data when you aren't actually passing any parameters. Perhaps this is less of a Python question and more an http question (which unfortunately I know nothing about either). Yes, the parameters are listed in the form. If you don't have at least a basic understanding of HTTP and HTML you are going to have trouble with this project... Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
Thanks Kent, perhaps I'll cool the Python jets and move on to HTTP and HTML. I was hoping it would be something I could just pick up along the way, looks like I was wrong. dk On Tue, Jul 7, 2009 at 1:56 PM, Kent Johnsonken...@tds.net wrote: On Tue, Jul 7, 2009 at 1:20 PM, David Kimdavidki...@gmail.com wrote: On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnsonken...@tds.net wrote: curl works because it ignores the redirect to the ToS page, and the site is (astoundingly) dumb enough to serve the content with the redirect. You could make urllib2 behave the same way by defining a 302 handler that does nothing. Many thanks for the redirect pointer! I also found http://diveintopython.org/http_web_services/redirects.html. Is the handler class on this page what you mean by a handler that does nothing? (It looks like it exposes the error code but still follows the redirect). No, all of those examples are handling the redirect. The SmartRedirectHandler just captures additional status. I think you need something like this: class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): return None def http_error_302(self, req, fp, code, msg, headers): return None I guess i'm still a little confused since, if the handler does nothing, won't I still go to the ToS page? No, it is the action of the handler, responding to the redirect request, that causes the ToS page to be fetched. For example, I ran the following code (found at http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect) That is pretty similar to the DiP code... I suspect I am not understanding something basic about how urllib2 deals with this redirect issue since it seems everything I try gives me the same ToS page. Maybe you don't understand how redirect works in general... Generally you have to post to the same url as the form, giving the same data the form does. You can inspect the source of the form to figure this out. In this case the form is form method=post action=/products/consent.php input type=hidden value=tiwd/products/derivserv/data_table_i.php name=urltarget/ input type=hidden value=1 name=check_one/ input type=hidden value=tiwdata name=tag/ input type=submit value=I Agree name=acknowledgement/ input type=submit value=Decline name=acknowledgement/ /form You generally need to enable cookie support in urllib2 as well, because the site will use a cookie to flag that you saw the consent form. This tutorial shows how to enable cookies and submit form data: http://personalpages.tds.net/~kent37/kk/00010.html I have seen the login examples where one provides values for the fields username and password (thanks Kent). Given the form above, however, it's unclear to me how one POSTs the form data when you aren't actually passing any parameters. Perhaps this is less of a Python question and more an http question (which unfortunately I know nothing about either). Yes, the parameters are listed in the form. If you don't have at least a basic understanding of HTTP and HTML you are going to have trouble with this project... Kent -- morenotestoself.wordpress.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
Hello all, I have two questions I'm hoping someone will have the patience to answer as an act of mercy. I. How to get past a Terms of Service page? I've just started learning python (have never done any programming prior) and am trying to figure out how to open or download a website to scrape data. The only problem is, whenever I try to open the link (via urllib2, for example) I'm after, I end up getting the HTML to a Terms of Service Page (where one has to click an I Agree button) rather than the actual target page. I've seen examples on the web on providing data for forms (typically by finding the name of the form and providing some sort of dictionary to fill in the form fields), but this simple act of getting past I Agree is stumping me. Can anyone save my sanity? As a workaround, I've been using os.popen('curl ' + url ' ' filename) to save the html in a txt file for later processing. I have no idea why curl works and urllib2, for example, doesn't (I use OS X). I even tried to use Yahoo Pipes to try and sidestep coding anything altogether, but ended up looking at the same Terms of Service page anyway. Here's the code (tho it's probably not that illuminating since it's basically just opening a url): import urllib2 url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1' #the first of 23 tables html = urllib2.urlopen(url).read() II. How to parse html tables with lxml, beautifulsoup? (for dummies) Assuming i get past the Terms of Service, I'm a bit overwhelmed by the need to know XPath, CSS, XML, DOM, etc. to scrape data from the web. I've tried looking at the documentation included with different python libraries, but just got more confused. The basic tutorials show something like the following: from lxml import html doc = html.parse(/path/to/test.txt) #the file i downloaded via curl root = doc.getroot() #what is this root business? tables = root.cssselect('table') I understand that selecting all the table tags will somehow target however many tables on the page. The problem is the table has multiple headers, empty cells, etc. Most of the examples on the web have to do with scraping the web for search results or something that don't really depend on the table format for anything other than layout. Are there any resources out there that are appropriate for web/python illiterati like myself that deal with structured data as in the url above? FYI, the data in the url above goes up in smoke every week, so I'm trying to capture it automatically on a weekly basis. Getting all of it into a CSV or database would be a personal cause for celebration as it would be the first really useful thing I've done with python since starting to learn it a few months ago. For anyone who is interested, here is the code that uses curl to pull the webpages. It basically just builds the url string for the different table-pages and saves down the file with a timestamped filename: import os from time import strftime BASE_URL = 'http://www.dtcc.com/products/derivserv/data_table_' SECTIONS = {'section1':{'select':'i.php?id=table', 'id':range(1,9)}, 'section2':{'select':'ii.php?id=table', 'id':range(9,17)}, 'section3':{'select':'iii.php?id=table', 'id':range(17,24)} } def get_pages(): filenames = [] path = '~/Dev/Data/DTCC_DerivServ/' #os.popen('cd ' + path) for section in SECTIONS: for id in SECTIONS[section]['id']: #urlList.append(BASE_URL + SECTIONS[section]['select']+str(id)) url = BASE_URL + SECTIONS[section]['select'] + str(id) timestamp = strftime('%Y%m%d_') #sectionName = BASE_URL.split('/')[-1] sectionNumber = SECTIONS[section]['select'].split('.')[0] tableNumber = str(id) + '_' filename = timestamp + tableNumber + sectionNumber + '.txt' os.popen('curl ' + url + ' ' + path + filename) filenames.append(filename) return filenames if (__name__ == '__main__'): get_pages() -- morenotestoself.wordpress.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!
Hi, David Kim wrote: I have two questions I'm hoping someone will have the patience to answer as an act of mercy. I. How to get past a Terms of Service page? I've just started learning python (have never done any programming prior) and am trying to figure out how to open or download a website to scrape data. The only problem is, whenever I try to open the link (via urllib2, for example) I'm after, I end up getting the HTML to a Terms of Service Page (where one has to click an I Agree button) rather than the actual target page. One comment to make here is that you should first read that page and check if the provider of the service actually allows you to automatically download content, or to use the service in the way you want. This is totally up to them, and if their terms of service state that you must not do that, well, then you must not do that. Once you know that it's permitted, you can read the ToS page and search for the form that the Agree button triggers. The URL given there is the one you have to read next, but augmented with the parameter (?xyz=...) that the button sends. I've seen examples on the web on providing data for forms (typically by finding the name of the form and providing some sort of dictionary to fill in the form fields), but this simple act of getting past I Agree is stumping me. Can anyone save my sanity? As a workaround, I've been using os.popen('curl ' + url ' ' filename) to save the html in a txt file for later processing. I have no idea why curl works and urllib2, for example, doesn't (I use OS X). There may be different reasons for that. One is that web servers often present different content based on the client identifier. So if you see one page with one client, and another page with a different client, that may be the reason. Here's the code (tho it's probably not that illuminating since it's basically just opening a url): import urllib2 url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1' #the first of 23 tables html = urllib2.urlopen(url).read() Hmmm, if what you want is to read a stock ticker or something like that, you should *really* read their ToS first and make sure they do not disallow automated access. Because it's actually quite likely that they do. II. How to parse html tables with lxml, beautifulsoup? (for dummies) Assuming i get past the Terms of Service, I'm a bit overwhelmed by the need to know XPath, CSS, XML, DOM, etc. to scrape data from the web. Using CSS selectors (lxml.cssselect) is not at all hard. You basically express the page structure in a *very* short and straight forward way. Searching the web for a CSS selectors tutorial should give you a few hits. The basic tutorials show something like the following: from lxml import html doc = html.parse(/path/to/test.txt) #the file i downloaded via curl ... or read from the standard output pipe of curl. Note that there is a stdlib module called subprocess, which may make running curl easier. Once you've determined the final URL to parse, you can also push it right into lxml's parse() function, instead of going through urllib2 or an external tool. Example: url = http://pypi.python.org/pypi?%3Aaction=searchterm=lxml; doc = html.parse(url) root = doc.getroot() #what is this root business? The root (or top-most) node of the document you just parsed. Usually an html tag in HTML pages. tables = root.cssselect('table') Simple, isn't it? :) BTW, did you look at this? http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ I understand that selecting all the table tags will somehow target however many tables on the page. The problem is the table has multiple headers, empty cells, etc. Most of the examples on the web have to do with scraping the web for search results or something that don't really depend on the table format for anything other than layout. That's because in cases like yours, you have to do most of the work yourself anyway. No page is like the other, so you have to find your way through the structure and figure out fixed points that allow you to get to the data. Stefan ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
On Tue, Feb 17, 2009 at 08:54, Norman Khine nor...@khine.net wrote: Thank you, but is it possible to get the original string from this? You mean something like this? urllib.quote('hL/FGNS40fjoTnp2zIqq73reK60=\n') 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' Greets Sander ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
On Tue, Feb 17, 2009 at 1:24 PM, Norman Khine nor...@khine.net wrote: Thank you, but is it possible to get the original string from this? What do you mean by the original string Norman? Look at these definitions: Quoted String: In the different parts of the URL, there are set of characters, for e.g. space character in path, that must be quoted, which means converted to a different form so that url is understood by the program. So ' ' is quoted to %20. Unquoted String: When %20 comes in the URL, humans need it unquoted so that we can understand it. What do you mean by original string? Why are you doing base64 encoding? And what are you trying to achieve? Perhaps these can help us to help you better? -- -- Senthil ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
On Mon, Feb 16, 2009 at 8:12 AM, Norman Khine nor...@khine.net wrote: Hello, Can someone point me in the right direction. I would like to return the string for the following: Type help, copyright, credits or license for more information. import base64, urllib data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' data = urllib.unquote(data) print base64.decodestring(data) ???Ը???Nzv̊??z?+? What am I missing? How is data created? Since it doesn't decode as you expect, either it isn't base64 or there is some other processing needed. Do you have an example of a data string where you know the desired decoded value? Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
it is my error, the data is a sha string and it is not possible to get the string back, unless you use rainbowtables or something of the sort. Kent Johnson wrote: On Mon, Feb 16, 2009 at 8:12 AM, Norman Khine nor...@khine.net wrote: Hello, Can someone point me in the right direction. I would like to return the string for the following: Type help, copyright, credits or license for more information. import base64, urllib data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' data = urllib.unquote(data) print base64.decodestring(data) ???Ը???Nzv̊??z?+? What am I missing? How is data created? Since it doesn't decode as you expect, either it isn't base64 or there is some other processing needed. Do you have an example of a data string where you know the desired decoded value? Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] urllib unquote
Hello, Can someone point me in the right direction. I would like to return the string for the following: Type help, copyright, credits or license for more information. import base64, urllib data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' data = urllib.unquote(data) print base64.decodestring(data) ???Ը???Nzv̊??z?+? What am I missing? Cheers Norman ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
On Mon, Feb 16, 2009 at 14:12, Norman Khine nor...@khine.net wrote: Type help, copyright, credits or license for more information. import base64, urllib data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' data = urllib.unquote(data) print base64.decodestring(data) ???Ը???Nzv̊??z?+? What am I missing? Not an expert here but I think you can skip the last step... urllib.unquote('hL/FGNS40fjoTnp2zIqq73reK60%3D%0A') 'hL/FGNS40fjoTnp2zIqq73reK60=\n' Greets Sander ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib unquote
Thank you, but is it possible to get the original string from this? Sander Sweers wrote: On Mon, Feb 16, 2009 at 14:12, Norman Khine nor...@khine.net wrote: Type help, copyright, credits or license for more information. import base64, urllib data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A' data = urllib.unquote(data) print base64.decodestring(data) ???Ը???Nzv̊??z?+? What am I missing? Not an expert here but I think you can skip the last step... urllib.unquote('hL/FGNS40fjoTnp2zIqq73reK60%3D%0A') 'hL/FGNS40fjoTnp2zIqq73reK60=\n' Greets Sander ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] URLLIB / GLOB
Hello, I would like to write a program which looks in a web directory for, say *.gif files. Then processes those files in some manner. What I need is something like glob which will return a directory listing of all the files matching the search pattern (or just a simply a certain extension). Is there a way to do this with urllib? Any other suggestions? Thanks! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] URLLIB / GLOB
John wrote: Hello, I would like to write a program which looks in a web directory for, say *.gif files. Then processes those files in some manner. What I need is something like glob which will return a directory listing of all the files matching the search pattern (or just a simply a certain extension). Is there a way to do this with urllib? Any other suggestions? If the directory is only available as a web page you will have to fetch the web directory listing itself with urllib or urllib2 and parse the HTML returned to get the list of files. You might want to use BeautifulSoup to parse the HTML. http://www.crummy.com/software/BeautifulSoup/ Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib
Hi again, I was able to use urllib2_file, which is a wrapper to urllib2.urlopen(). It seems to work fine, and I'm able to retrieve the contents of the file using: afile = req.form.list[1].file.read() Now I have to store this text file (which is about 500k) and an id number into a mysql database in a web server. I have a table that has two columns user id (int) and mediumblob. The problem I have now is I don't know how to store them into the database. I've been looking for examples without any luck. I tried using load data infile, but it seems that I would need to have this client_side file stored in the server. I used load data local infile, and got some errors. I also thought about storing them like this: afile = req.form.list[1].file.read() cursor.execute(insert into p_report (sales_order, file_cont ) values (%s, %s), (1, afile)) I really don't know which is the best way to do it. Which is the right approach? I'm really hoping someone can give me an idea how to do it because I'm finding this a frustrating. Thanks, Patricia ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib
Patricia wrote: Hi, I have used urllib and urllib2 to post data like the following: dict = {} dict['data'] = info dict['system'] = aname data = urllib.urlencode(dict) req = urllib2.Request(url) And to get the data, I emulated a web page with a submit button: s = htmlbody s += form action='a_method' method='POST' s += textarea cols='80' rows='200' name='data'/textarea s += input type='text' name='system' s += input type='submit' value='Submit' s += /form/body/html I would like to know how to send a file. It's a text file that will be gzipped before being posted. I'm using python version 2.2.3. There are some old examples hereA http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/146306 I think the modern way uses email.MIMEMultipart but I don't have an example handy. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] urllib
Hi, I have used urllib and urllib2 to post data like the following: dict = {} dict['data'] = info dict['system'] = aname data = urllib.urlencode(dict) req = urllib2.Request(url) And to get the data, I emulated a web page with a submit button: s = htmlbody s += form action='a_method' method='POST' s += textarea cols='80' rows='200' name='data'/textarea s += input type='text' name='system' s += input type='submit' value='Submit' s += /form/body/html I would like to know how to send a file. It's a text file that will be gzipped before being posted. I'm using python version 2.2.3. Thanks, Patricia ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] URLLIB
Hello list I am on challenge 5. I think I need to some how download a file. I have been trying like so X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('SomeFileName') but with no luck. Servando Garcia John 3:16 For GOD so loved the world..___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] URLLIB
Servando Garcia wrote: Hello list I am on challenge 5. I think I need to some how download a file. I have been trying like so X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('SomeFileName') urlopener() returns a file-like object - something that behaves like an open file. Try x = urllib.urlopener(name) data = x.read() Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor