subject:"\"\\\[Tutor\\\] urllib\""

Re: [Tutor] urllib ... lost novice's question

2017-05-10 Thread Alan Gauld via Tutor

On 10/05/17 17:06, Rafael Knuth wrote:
>>> Then, there is another package, along with a dozen other
>>> urllib-related packages (such as aiourllib).
>>
>> Again, where are you finding these? They are not in
>> the standard library. Have you been installing other
>> packages that may have their own versions maybe?
> 
> they are all available via PyCharm EDU

It looks like PyCharm may be adding extra packages to
the standard library. Thats OK, both ActiveState and
Anaconda (and others) do the same, but it does mean
you need to check on python.org to see what is and
what isn't "approved".

If it's not official content then you need to ask on
a PyCharm forum about the preferred choices. The fact
they are included suggests that somebody has tested
them and found them useful in some way, but you would
need to ask them why they chose those packages and
when they would be more suitable than the standard
versions.

These bonus packages are often seen as a valuable
extra, but they do carry a burden of responsibility
for the user to identify which is best for them,
and that's not always easy to assess, especially
for a beginner.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib ... lost novice's question

2017-05-10 Thread Rafael Knuth

>> Then, there is another package, along with a dozen other
>> urllib-related packages (such as aiourllib).
>
> Again, where are you finding these? They are not in
> the standard library. Have you been installing other
> packages that may have their own versions maybe?

they are all available via PyCharm EDU
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib ... lost novice's question

2017-05-09 Thread Mats Wichmann

this is one of those things where if what you want is simple, they're all 
usable, and easy. if not, some are frankly horrid.

requests is the current hot module. go ahead and try it. (urllib.request is not 
from requests, it's from urllib)

On May 8, 2017 9:23:15 AM MDT, Rafael Knuth  wrote:
>Which package should I use to fetch and open an URL?
>I am using Python 3.5 and there are presently 4 versions:
>
>urllib2
>urllib3
>urllib4
>urllib5
>
>Common sense is telling me to use the latest version.
>Not sure if my common sense is fooling me here though ;-)
>
>Then, there is another package, along with a dozen other
>urllib-related packages (such as aiourllib). I thought this one is
>doing what I need:
>
>urllib.request
>
>The latter I found on http://docs.python-requests.org along with these
>encouraging words:
>
>"Warning: Recreational use of the Python standard library for HTTP may
>result in dangerous side-effects, including: security vulnerabilities,
>verbose code, reinventing the wheel, constantly reading documentation,
>depression, headaches, or even death."
>
>How do I know where to find the right package - on python.org or
>elsewhere?
>I found some code samples that show how to use urllib.request, now I
>am trying to understand why I should use urllib.request.
>Would it be also doable to do requests using urllib5 or any other
>version? Like 2 or 3? Just trying to understand.
>
>I am lost here. Feeback appreciated. Thank you!
>
>BTW, here's some (working) exemplary code I have been using for
>educational purposes:
>
>import urllib.request
>from bs4 import BeautifulSoup
>
>theurl = "https://twitter.com/rafaelknuth";
>thepage = urllib.request.urlopen(theurl)
>soup = BeautifulSoup(thepage, "html.parser")
>
>print(soup.title.text)
>
>i = 1
>for tweets in soup.findAll("div",{"class":"content"}):
>print(i)
>print(tweets.find("p").text)
>i = i + 1
>
>I am assuming there are different solutions for fetching and open URLs?
>Or is the above the only viable solution?
>___
>Tutor maillist  -  Tutor@python.org
>To unsubscribe or change subscription options:
>https://mail.python.org/mailman/listinfo/tutor

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib ... lost novice's question

2017-05-09 Thread Abdur-Rahmaan Janhangeer

As a side note see a tutorial on urllib and requests and try them at the
same time

see for python 3.x; 3.4 or 3.6

also see the data type received by the different combinations, when you
should use .read() etc

also use utf-8 or unicode like .decode("utf8")

Well play around fool mess with it, feel free as when you'll do serious
stuffs you won't need to test to know what should be done or not, what
breaks it or not.

summary : learn it well from the begining

Finding the right package.

Hum either in your beginner learning path you learn popular third party
modules

or

You find how the people round the net did what you are doing, see how they
did it and what modules they used

or

google "module "

or

browse pypi

or _long term_

never stop reading about python. so you'll constantly discover new things
and reduce the probability of you not knowing how to do something.

Hope it helps,

Abdur-Rahmaan Janhangeer
Vacoas,
Mauritius
https://abdurrahmaanjanhangeer.wordpress.com/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib ... lost novice's question

2017-05-08 Thread Alan Gauld via Tutor

On 08/05/17 16:23, Rafael Knuth wrote:
> Which package should I use to fetch and open an URL?
> I am using Python 3.5 and there are presently 4 versions:
> 
> urllib2
> urllib3
> urllib4
> urllib5

I don't know where you are getting those from but the
standard install of Python v3.6 only has urllib. This
is a package with various modules inside.

ISTR there was a urllib2 in Python 2 for a while but
I've never heard of any 3,4, or 5.

> Then, there is another package, along with a dozen other
> urllib-related packages (such as aiourllib). 

Again, where are you finding these? They are not in
the standard library. Have you been installing other
packages that may have their own versions maybe?

> urllib.request
> 
> The latter I found on http://docs.python-requests.org along with these
> encouraging words:
> 
> "Warning: Recreational use of the Python standard library for HTTP may
> result in dangerous side-effects, including: security vulnerabilities,
> verbose code, reinventing the wheel, constantly reading documentation,
> depression, headaches, or even death."

That's true of almost any package used badly.

Remember that this is "marketing" propaganda from an
alternative package maintainer. And while most folks
(including me)seem to agree that Requests is easier
to use than the standard library, the standard library
version works just fine if you take sensible care.

> How do I know where to find the right package

There is no right package, just the one you find most effective.
Most folks would say that Requests is easier to use than the
standard library, if you are doing anything non-trivial I'd
second that opinion.

> I found some code samples that show how to use urllib.request, now I
> am trying to understand why I should use urllib.request.

Because as part of the standard library you can be sure
it will be thee, whereas Requests is a third party module
that needs to be downloaded/installed and therefore may
not be present (or even allowed by the server admins)

Or maybe because you found some old code written before
Requests became popular and you need to integrate with
it or reuse it.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib ... lost novice's question

2017-05-08 Thread Rafael Knuth

Which package should I use to fetch and open an URL?
I am using Python 3.5 and there are presently 4 versions:

urllib2
urllib3
urllib4
urllib5

Common sense is telling me to use the latest version.
Not sure if my common sense is fooling me here though ;-)

Then, there is another package, along with a dozen other
urllib-related packages (such as aiourllib). I thought this one is
doing what I need:

urllib.request

The latter I found on http://docs.python-requests.org along with these
encouraging words:

"Warning: Recreational use of the Python standard library for HTTP may
result in dangerous side-effects, including: security vulnerabilities,
verbose code, reinventing the wheel, constantly reading documentation,
depression, headaches, or even death."

How do I know where to find the right package - on python.org or elsewhere?
I found some code samples that show how to use urllib.request, now I
am trying to understand why I should use urllib.request.
Would it be also doable to do requests using urllib5 or any other
version? Like 2 or 3? Just trying to understand.

I am lost here. Feeback appreciated. Thank you!

BTW, here's some (working) exemplary code I have been using for
educational purposes:

import urllib.request
from bs4 import BeautifulSoup

theurl = "https://twitter.com/rafaelknuth";
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")

print(soup.title.text)

i = 1
for tweets in soup.findAll("div",{"class":"content"}):
print(i)
print(tweets.find("p").text)
i = i + 1

I am assuming there are different solutions for fetching and open URLs?
Or is the above the only viable solution?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib confusion

2014-11-23 Thread Cameron Simpson


On 21Nov2014 15:57, Clayton Kirkwood  wrote:

Got a general problem with url work. I’ve struggled through a lot of
code which uses urllib.[parse,request]* and urllib2. First q: I read
someplace in urllib documentation which makes it sound like either
urllib or urllib2 modules are being deprecated in 3.5. Don’t know if

it’s only part or whole.

The names of the modules changed I believe in v3.x.


I don't think so because I've seen both lib and lib2 in both new and old code, 
and current 4.3 documentation talks only of urllib.


You mean "3.4" I would hope.

It is clear from this:

 https://docs.python.org/3/py-modindex.html#cap-u

that there is no "urllib2" in Python 3, just "urllib".

I recommend you read this:

 https://docs.python.org/3/whatsnew/3.0.html

which is a very useful overview of the main changes which came with Python 3, 
and covers almost all the "structural" changes (such as module renames); the 
3.0 release was the Big Change.



But you can save yourself a lot of trouble by using the excelent 3rd
party package called requests:
http://docs.python-requests.org/en/latest/


I've seen nothing of this.


You have now. It is very popular and widely liked.

Cheers,
Cameron Simpson 

'Supposing a tree fell down, Pooh, when we were underneath it?'
'Supposing it didn't,' said Pooh after careful thought.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib confusion

2014-11-22 Thread Steven D'Aprano

On Fri, Nov 21, 2014 at 01:37:45PM -0800, Clayton Kirkwood wrote:

> Got a general problem with url work. I've struggled through a lot of code
> which uses urllib.[parse,request]* and urllib2. First q: I read someplace in
> urllib documentation which makes it sound like either urllib or urllib2
> modules are being deprecated in 3.5. Don't know if it's only part or whole.

Can you point us to this place? I would be shocked and rather dismayed 
to hear that urllib(2) was being deprecated, but it is possible that one 
small component is being renamed/moved/deprecated.

> I've read through a lot that says that urllib..urlopen needs urlencode,
> and/or encode('utf-8') for byte conversion, but I've seen plenty of examples
> where nothing is being encoded either way. I also have a sneeking suspicious
> that urllib2 code does all of the encoding. I've read that if things aren't
> encoded that I will get TypeError, yet I've seen plenty of examples where
> there is no error and no encoding.

It's hard to comment and things you've read when we don't know what they 
are or precisely what they say. "I read that..." is the equivalent of "a 
man down the pub told me...".

If the examples are all ASCII, then no charset encoding is 
needed, although urlencode will still perform percent-encoding:

py> from urllib.parse import urlencode
py> urlencode({"key": ""})
'key=%3Cvalue%3E'

The characters '<' and '>' are not legal inside URLs, so they have to be 
encoded as '%3C' and '%3E'. Because all the characters are ASCII, the 
result remains untouched.

Non-ASCII characters, on the other hand, are encoded into UTF-8 by 
default, although you can pick another encoding and/or error handler:

py> urlencode({"key": "© 2014"})
'key=%C2%A9+2014'

The copyright symbol © encoded into UTF-8 is the two bytes 
\xC2\xA9 which are then percent encoded into %C2%A9.

> Why do so many examples seem to not encode? And not get TypeError? And yes,
> for those of you who are about to suggest it, I have tried a lot of things
> and read for many hours.

One actual example is worth about a thousand vague descriptions.

But in general, I would expect that the urllib functions default to 
using UTF-8 as the encoding, so you don't have to manually specify an 
encoding, it just works.

-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib confusion

2014-11-21 Thread Clayton Kirkwood



>-Original Message-
>From: Joel Goldstick [mailto:joel.goldst...@gmail.com]
>Sent: Friday, November 21, 2014 2:39 PM
>To: Clayton Kirkwood
>Cc: tutor@python.org
>Subject: Re: [Tutor] urllib confusion
>
>On Fri, Nov 21, 2014 at 4:37 PM, Clayton Kirkwood 
>wrote:
>> Hi all.
>>
>>
>>
>> Got a general problem with url work. I’ve struggled through a lot of
>> code which uses urllib.[parse,request]* and urllib2. First q: I read
>> someplace in urllib documentation which makes it sound like either
>> urllib or urllib2 modules are being deprecated in 3.5. Don’t know if
>it’s only part or whole.
>
>The names of the modules changed I believe in v3.x.

I don't think so because I've seen both lib and lib2 in both new and old code, 
and current 4.3 documentation talks only of urllib.

>
>But you can save yourself a lot of trouble by using the excelent 3rd
>party package called requests:
>http://docs.python-requests.org/en/latest/

I've seen nothing of this.

>
>Also, please use plaintext for your questions.  That way everyone can
>read them, and the indentation won't get mangled
>>
>> I’ve read through a lot that says that urllib..urlopen needs
>> urlencode, and/or encode(‘utf-8’) for byte conversion, but I’ve seen
>> plenty of examples where nothing is being encoded either way. I also
>> have a sneeking suspicious that urllib2 code does all of the encoding.
>> I’ve read that if things aren’t encoded that I will get TypeError, yet
>> I’ve seen plenty of examples where there is no error and no encoding.
>>
>>
>>
>> Why do so many examples seem to not encode? And not get TypeError? And
>> yes, for those of you who are about to suggest it, I have tried a lot
>> of things and read for many hours.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Clayton
>>
>>
>>
>>
>>
>>
>>
>> You can tell the caliber of a man by his gun--c. kirkwood
>>
>>
>>
>>
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
>--
>Joel Goldstick
>http://joelgoldstick.com



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib confusion

2014-11-21 Thread Alan Gauld


On 21/11/14 21:37, Clayton Kirkwood wrote:


urllib or urllib2 modules are being deprecated in 3.5. Don’t know if
it’s only part or whole.


urlib2 doesn't exist in Python3 there is only the urllib package.

As to urllib being deprecated, thats the first I've heard of
it but it may be the case - I don;t follow the new releases closely 
since I'm usually at least 2 releases behind. I only upgraded to 3.4 
because I was writing the new book and needed it to be as current as 
possible.


But the "What's New" document for the 3.5 alpha says:

"A new urllib.request.HTTPBasicPriorAuthHandler allows HTTP Basic 
Authentication credentials to be sent unconditionally with the first 
HTTP request, rather than waiting for a HTTP 401 Unauthorized response 
from the server. (Contributed by Matej Cepl in issue 19494.)"


And the NEWS file adds:

"urllib.request.urlopen will accept a context object
 (SSLContext) as an argument which will then used be
for HTTPS connection.  Patch by Alex Gaynor."

Which suggests urllib is alive and kicking...


I’ve read through a lot that says that urllib..urlopen needs urlencode,
and/or encode(‘utf-8’) for byte conversion, but I’ve seen plenty of
examples where nothing is being encoded either way.


Might those be v2 examples?
encoding got a whole lot more specific in Python v3.

But I'm not sure what you mean by the double dot.
urllib.urlopen is discontinued in Python3. You
should be using urllib.request.urlopen instead.
(But maybe thats what you meant by the ..?)


Why do so many examples seem to not encode? And not get TypeError?


Without specific examples it's hard to know.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my phopto-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib confusion

2014-11-21 Thread Joel Goldstick

On Fri, Nov 21, 2014 at 4:37 PM, Clayton Kirkwood  wrote:
> Hi all.
>
>
>
> Got a general problem with url work. I’ve struggled through a lot of code
> which uses urllib.[parse,request]* and urllib2. First q: I read someplace in
> urllib documentation which makes it sound like either urllib or urllib2
> modules are being deprecated in 3.5. Don’t know if it’s only part or whole.

The names of the modules changed I believe in v3.x.

But you can save yourself a lot of trouble by using the excelent 3rd
party package called requests:
http://docs.python-requests.org/en/latest/

Also, please use plaintext for your questions.  That way everyone can
read them, and the indentation won't get mangled
>
> I’ve read through a lot that says that urllib..urlopen needs urlencode,
> and/or encode(‘utf-8’) for byte conversion, but I’ve seen plenty of examples
> where nothing is being encoded either way. I also have a sneeking suspicious
> that urllib2 code does all of the encoding. I’ve read that if things aren’t
> encoded that I will get TypeError, yet I’ve seen plenty of examples where
> there is no error and no encoding.
>
>
>
> Why do so many examples seem to not encode? And not get TypeError? And yes,
> for those of you who are about to suggest it, I have tried a lot of things
> and read for many hours.
>
>
>
> Thanks,
>
>
>
> Clayton
>
>
>
>
>
>
>
> You can tell the caliber of a man by his gun--c. kirkwood
>
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>



-- 
Joel Goldstick
http://joelgoldstick.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib confusion

2014-11-21 Thread Clayton Kirkwood

Hi all.

 

Got a general problem with url work. I've struggled through a lot of code
which uses urllib.[parse,request]* and urllib2. First q: I read someplace in
urllib documentation which makes it sound like either urllib or urllib2
modules are being deprecated in 3.5. Don't know if it's only part or whole.

I've read through a lot that says that urllib..urlopen needs urlencode,
and/or encode('utf-8') for byte conversion, but I've seen plenty of examples
where nothing is being encoded either way. I also have a sneeking suspicious
that urllib2 code does all of the encoding. I've read that if things aren't
encoded that I will get TypeError, yet I've seen plenty of examples where
there is no error and no encoding.

 

Why do so many examples seem to not encode? And not get TypeError? And yes,
for those of you who are about to suggest it, I have tried a lot of things
and read for many hours.

 

Thanks,

 

Clayton

 

 

 

You can tell the caliber of a man by his gun--c. kirkwood

 

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib Problem

2011-07-29 Thread Steven D'Aprano

George Anonymous wrote:

I am trying to make a simple programm with Python 3,that tries to open
differnet pages from a wordlist and prints which are alive.Here is the code:
from urllib import request
fob=open('c:/passwords/pass.txt','r')
x = fob.readlines()
for i in x:
urllib.request.openurl('www.google.gr/' + i)

But it doesent work.Whats the problem?

A guessing game! I LOVE guessing games!!! :)

Let's seen let me guess what you mean by "doesn't work":

- the computer locks up and sits there until you hit the restart switch
- the computer gives a Blue Screen Of Death
- Python raises an exception
- Python downloads the Yahoo website instead of Google
- something else

My guess is... you're getting a NameError exception, like this one:

>>> from urllib import request
>>> x = urllib.request.openurl('www.google.com')
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'urllib' is not defined

Am I close?

You need to use request.urlopen, not urllib.request.openurl.

That's your *first* problem. There are more. Come back if you need help 
with the others, and next time, don't make us play guessing games. Show 
us the code you use -- copy and paste it, don't retype it from memory -- 
what you expect should happen, and what actually happens instead.

--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib Problem

2011-07-29 Thread Alexander

On Fri, Jul 29, 2011 at 5:58 AM, Karim  wrote:

> **
> On 07/29/2011 11:52 AM, George Anonymous wrote:
>
> I am trying to make a simple programm with Python 3,that tries to open
> differnet pages from a wordlist and prints which are alive.Here is the code:
> from urllib import request
> fob=open('c:/passwords/pass.txt','r')
> x = fob.readlines()
> for i in x:
> urllib.request.openurl('www.google.gr/' + i)
>
> But it doesent work.Whats the problem?
>
>
> Please give the exception error you get?!
> And you should have in the html header
> the html code error number which gives
> you the fail answer from the server.
>
> Cheers
> Karim
>
> As Karim noted you'll want to mention any exceptions you are getting. I'm
not sure what it is you are trying to do with your code. If you'd like to
try to open each line and try something if it works else an exception the
code may read something similar to:

fob = open('C:/passwords/pass.txt','r')
fob_rlines = fob.readlines()
for line in fob_rlines:
try:
#whatever it is you would like to do with each line
except Exception: #where code didn't work and an exception occured
#whatever you would like to do when a particular *Exception* occurs
Hope that helps,
Alexander

>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription 
> options:http://mail.python.org/mailman/listinfo/tutor
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib Problem

2011-07-29 Thread Karim


On 07/29/2011 11:52 AM, George Anonymous wrote:
I am trying to make a simple programm with Python 3,that tries to open 
differnet pages from a wordlist and prints which are alive.Here is the 
code:

from urllib import request
fob=open('c:/passwords/pass.txt','r')
x = fob.readlines()
for i in x:
urllib.request.openurl('www.google.gr/ ' + i)

But it doesent work.Whats the problem?



Please give the exception error you get?!
And you should have in the html header
the html code error number which gives
you the fail answer from the server.

Cheers
Karim



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Urllib Problem

2011-07-29 Thread George Anonymous

I am trying to make a simple programm with Python 3,that tries to open
differnet pages from a wordlist and prints which are alive.Here is the code:
from urllib import request
fob=open('c:/passwords/pass.txt','r')
x = fob.readlines()
for i in x:
urllib.request.openurl('www.google.gr/' + i)

But it doesent work.Whats the problem?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Alan Gauld



"Roelof Wobben"  wrote


Finally solved this puzzle.
Now the next one of the 33 puzzles.


Don;t be surprised if you get stuck. Python Challenge is quite tricky
and is deliberately designed to make you explore parts of the
standard library you might not otherwise find. Expect to do a lot
of reading in the documebntation.

Its really targeted at intermediate rather than novice
programmers IMHO.

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Roelof Wobben





> From: st...@pearwood.info
> To: tutor@python.org
> Date: Wed, 13 Oct 2010 01:51:16 +1100
> Subject: Re: [Tutor] urllib problem
>
> On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote:
> > On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
> > > Hoi,
> > >
> > > I have this programm :
> > >
> > > import urllib
> > > import re
> > > f =
> > > urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.ph
> > >p? nothing=6") inhoud = f.read()
> > > f.close()
> > > nummer = re.search('[0-9]', inhoud)
> > > volgende = int(nummer.group())
> > > teller = 1
> > > while teller <= 3 :
> > > url =
> > > "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; +
> > > str(volgende) f = urllib.urlopen(url)
> > > inhoud = f.read()
> > > f.close()
> > > nummer = re.search('[0-9]', inhoud)
> > > print "nummer is", nummer.group()
> > > volgende = int(nummer.group())
> > > print volgende
> > > teller = teller + 1
> > >
> > > but now the url changes but volgende not.
> > > What do I have done wrong ?
> >
> > Each time through the loop, you set volgende to the same result:
> >
> > nummer = re.search('[0-9]', inhoud)
> > volgende = int(nummer.group())
> >
> > Since inhoud never changes, and the search never changes, the search
> > result never changes, and volgende never changes.
>
> Wait, sorry, inhoud should change... I missed the line inhoud = f.read()
>
> My mistake, sorry about that. However, I can now see what is going
> wrong. Your regular expression only looks for a single digit:
>
> re.search('[0-9]', inhoud)
>
> If you want any number of digits, you need '[0-9]+' instead.
>
>
> Starting from the first URL:
>
> >>> f = urllib.urlopen(
> ... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6";)
> >>> inhoud = f.read()
> >>> f.close()
> >>> print inhoud
> and the next nothing is 87599
>
>
> but:
>
> >>> nummer = re.search('[0-9]', inhoud)
> >>> nummer.group()
> '8'
>
> See, you only get the first digit. Then looking up the page with
> nothing=8 gives a first digit starting with 5, and then you get stuck
> on 5 forever:
>
> >>> urllib.urlopen(
> ... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8";).read()
> 'and the next nothing is 59212'
> >>> urllib.urlopen(
> ... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5";).read()
> 'and the next nothing is 51716'
>
>
> You need to add a + to the regular expression, which means "one or more
> digits" instead of "a single digit".
>
>
>
> --
> Steven D'Aprano
> ___
> Tutor maillist - Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

 
Hoi Steven, 
 
Finally solved this puzzle.
Now the next one of the 33 puzzles.
 
Roelof
  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Roelof Wobben





> From: st...@pearwood.info
> To: tutor@python.org
> Date: Tue, 12 Oct 2010 23:58:03 +1100
> Subject: Re: [Tutor] urllib problem
>
> On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
>> Hoi,
>>
>> I have this programm :
>>
>> import urllib
>> import re
>> f =
>> urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?
>>nothing=6") inhoud = f.read()
>> f.close()
>> nummer = re.search('[0-9]', inhoud)
>> volgende = int(nummer.group())
>> teller = 1
>> while teller <= 3 :
>> url =
>> "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; +
>> str(volgende) f = urllib.urlopen(url)
>> inhoud = f.read()
>> f.close()
>> nummer = re.search('[0-9]', inhoud)
>> print "nummer is", nummer.group()
>> volgende = int(nummer.group())
>> print volgende
>> teller = teller + 1
>>
>> but now the url changes but volgende not.
>> What do I have done wrong ?
>
> Each time through the loop, you set volgende to the same result:
>
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
>
> Since inhoud never changes, and the search never changes, the search
> result never changes, and volgende never changes.
>
>
>
> --
> Steven D'Aprano
> ___
> Tutor maillist - Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
 
 
Hello, 
 
Here is the output when I print every step in the beginning :
 
inhoud : and the next nothing is 87599
nummer is 8
volgende is  8
 
and here is the output in the loop :
 
 
url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8
inhoud is and the next nothing is 59212
nummer is 5

 
2ste run:
url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5
inhoud is and the next nothing is 51716
nummer is 5

3ste run:
url is: http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5
inhoud is and the next nothing is 51716
nummer is 5

4ste run:

I see the problem. It only takes the first number of the nothing.
So I have to look how to solve that.
 
Roelof

  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Steven D'Aprano

On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote:
> On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
> > Hoi,
> >
> > I have this programm :
> >
> > import urllib
> > import re
> > f =
> > urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.ph
> >p? nothing=6") inhoud = f.read()
> > f.close()
> > nummer = re.search('[0-9]', inhoud)
> > volgende = int(nummer.group())
> > teller = 1
> > while teller <= 3 :
> >   url =
> > "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; +
> > str(volgende) f = urllib.urlopen(url)
> >   inhoud = f.read()
> >   f.close()
> >   nummer = re.search('[0-9]', inhoud)
> >   print "nummer is", nummer.group()
> >   volgende = int(nummer.group())
> >   print volgende
> >   teller = teller + 1
> >
> > but now the url changes but volgende not.
> > What do I have done wrong ?
>
> Each time through the loop, you set volgende to the same result:
>
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
>
> Since inhoud never changes, and the search never changes, the search
> result never changes, and volgende never changes.

Wait, sorry, inhoud should change... I missed the line inhoud = f.read()

My mistake, sorry about that. However, I can now see what is going 
wrong. Your regular expression only looks for a single digit:

re.search('[0-9]', inhoud)

If you want any number of digits, you need '[0-9]+' instead.


Starting from the first URL:

>>> f = urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6";)
>>> inhoud = f.read()
>>> f.close()
>>> print inhoud
and the next nothing is 87599


but:

>>> nummer = re.search('[0-9]', inhoud)
>>> nummer.group()
'8'

See, you only get the first digit. Then looking up the page with 
nothing=8 gives a first digit starting with 5, and then you get stuck 
on 5 forever:

>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8";).read() 
'and the next nothing is 59212'
>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5";).read() 
'and the next nothing is 51716'


You need to add a + to the regular expression, which means "one or more 
digits" instead of "a single digit".



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Steven D'Aprano

On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
> Hoi,
>
> I have this programm :
>
> import urllib
> import re
> f =
> urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?
>nothing=6") inhoud = f.read()
> f.close()
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
> teller = 1
> while teller <= 3 :
>   url =
> "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; +
> str(volgende) f = urllib.urlopen(url)
>   inhoud = f.read()
>   f.close()
>   nummer = re.search('[0-9]', inhoud)
>   print "nummer is", nummer.group()
>   volgende = int(nummer.group())
>   print volgende
>   teller = teller + 1
>
> but now the url changes but volgende not.
> What do I have done wrong ?

Each time through the loop, you set volgende to the same result:

nummer = re.search('[0-9]', inhoud)
volgende = int(nummer.group())

Since inhoud never changes, and the search never changes, the search 
result never changes, and volgende never changes.



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib problem

2010-10-12 Thread Evert Rol

> I have this program :
> 
> import urllib
> import re
> f = 
> urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6";)
> inhoud = f.read()
> f.close()
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
> teller = 1 
> while teller <= 3 :
>  url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; + 
> str(volgende)
>  f = urllib.urlopen(url)
>  inhoud = f.read()
>  f.close()
>  nummer = re.search('[0-9]', inhoud)
>  print "nummer is", nummer.group()
>  volgende = int(nummer.group())
>  print volgende
>  teller = teller + 1
> 
> but now the url changes but volgende not.

I think number will change; *unless* you happen to retrieve the same number 
every time, even when you access a different url.
What is the result when you run this program, ie, the output of your print 
statements (then, also, print url)?
And, how can url change, but volgende not? Since url depends on volgende.

Btw, it may be better to use parentheses in your regular expression to 
explicitly group whatever you want to match, though the above will work (since 
it groups the whole match). But Python has this "Explicit is better than 
implicit" thing.

Cheers,

  Evert

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib problem

2010-10-12 Thread Roelof Wobben



Hoi, 
 
I have this programm :
 
import urllib
import re
f = 
urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6";)
inhoud = f.read()
f.close()
nummer = re.search('[0-9]', inhoud)
volgende = int(nummer.group())
teller = 1 
while teller <= 3 :
  url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="; + 
str(volgende)
  f = urllib.urlopen(url)
  inhoud = f.read()
  f.close()
  nummer = re.search('[0-9]', inhoud)
  print "nummer is", nummer.group()
  volgende = int(nummer.group())
  print volgende
  teller = teller + 1
 
but now the url changes but volgende not.
 
What do I have done wrong ?
 
Roelof 
  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib

2009-12-07 Thread Jojo Mwebaze

thanks, Senthil

On Mon, Dec 7, 2009 at 11:10 AM, Senthil Kumaran wrote:

> On Mon, Dec 07, 2009 at 08:38:24AM +0100, Jojo Mwebaze wrote:
> > I need help on something very small...
> >
> > i am using urllib to write a query and what i want returned is
> 'FHI=128%2C128&
> > FLO=1%2C1'
> >
>
> The way to use urllib.encode is like this:
>
> >>> urllib.urlencode({"key":"value"})
> 'key=value'
> >>> urllib.urlencode({"key":"value","key2":"value2"})
> 'key2=value2&key=value'
>
> For your purpses, you need to construct the dict this way:
>
> >>> urllib.urlencode({"FHI":'128,128',"FHO":'1,1'})
> 'FHO=1%2C1&FHI=128%2C128'
> >>>
>
>
> And if you are to use variables, one way to do it would be:
>
> >>> x1,y1,x2,y2 = 1,1,128,128
> >>> fhi = str(x2) + ',' + str(y2)
> >>> fho = str(x1) + ',' + str(y1)
> >>> urllib.urlencode({"FHI":fhi,"FHO":fho})
> 'FHO=1%2C1&FHI=128%2C128'
>
> --
> Senthil
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib

2009-12-07 Thread Senthil Kumaran

On Mon, Dec 07, 2009 at 08:38:24AM +0100, Jojo Mwebaze wrote:
> I need help on something very small...
> 
> i am using urllib to write a query and what i want returned is 'FHI=128%2C128&
> FLO=1%2C1'
> 

The way to use urllib.encode is like this:

>>> urllib.urlencode({"key":"value"})
'key=value'
>>> urllib.urlencode({"key":"value","key2":"value2"})
'key2=value2&key=value'

For your purpses, you need to construct the dict this way:

>>> urllib.urlencode({"FHI":'128,128',"FHO":'1,1'})
'FHO=1%2C1&FHI=128%2C128'
>>> 


And if you are to use variables, one way to do it would be:

>>> x1,y1,x2,y2 = 1,1,128,128
>>> fhi = str(x2) + ',' + str(y2)
>>> fho = str(x1) + ',' + str(y1)
>>> urllib.urlencode({"FHI":fhi,"FHO":fho})
'FHO=1%2C1&FHI=128%2C128'

-- 
Senthil
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib

2009-12-06 Thread Jojo Mwebaze

hello Tutor,

I need help on something very small...

i am using urllib to write a query and what i want returned is
'FHI=128%2C128&FLO=1%2C1'

i have tried the statement below and i have failed to get the above..

x1,y1,x2,y2 = 1,1,128,128

query = urllib.urlencode({'FHI':'x2,y2,', 'FLO':'x1,y1'})

when that failed, i tried to use

query ='FHI=%(x2)d%2C%(y2)d&FLO=%(x1)d%2C%(y1)d' % vars()

returned an error "TypeError: not enough arguments for format string"

i also tied

query ='FHI=%d\%2C%d&FLO=%d\%2C%d' %(x1,x2,y1,y2)

i got the error ValueError: unsupported format character 'C' (0x43) at index
8

Where could i be going wrong!

Johnson
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-07 Thread David Kim

Thanks Kent, perhaps I'll cool the Python jets and move on to HTTP and
HTML. I was hoping it would be something I could just pick up along
the way, looks like I was wrong.

dk

On Tue, Jul 7, 2009 at 1:56 PM, Kent Johnson wrote:
> On Tue, Jul 7, 2009 at 1:20 PM, David Kim wrote:
>> On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnson wrote:
>>>
>>> curl works because it ignores the redirect to the ToS page, and the
>>> site is (astoundingly) dumb enough to serve the content with the
>>> redirect. You could make urllib2 behave the same way by defining a 302
>>> handler that does nothing.
>>
>> Many thanks for the redirect pointer! I also found
>> http://diveintopython.org/http_web_services/redirects.html. Is the
>> handler class on this page what you mean by a handler that does
>> nothing? (It looks like it exposes the error code but still follows
>> the redirect).
>
> No, all of those examples are handling the redirect. The
> SmartRedirectHandler just captures additional status. I think you need
> something like this:
> class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler):
>    def http_error_301(self, req, fp, code, msg, headers):
>        return None
>
>    def http_error_302(self, req, fp, code, msg, headers):
>        return None
>
>> I guess i'm still a little confused since, if the
>> handler does nothing, won't I still go to the ToS page?
>
> No, it is the action of the handler, responding to the redirect
> request, that causes the ToS page to be fetched.
>
>> For example, I ran the following code (found at
>> http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect)
>
> That is pretty similar to the DiP code...
>
>> I suspect I am not understanding something basic about how urllib2
>> deals with this redirect issue since it seems everything I try gives
>> me the same ToS page.
>
> Maybe you don't understand how redirect works in general...
>
>>> Generally you have to post to the same url as the form, giving the
>>> same data the form does. You can inspect the source of the form to
>>> figure this out. In this case the form is
>>>
>>> 
>>> >> name="urltarget"/>
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>> You generally need to enable cookie support in urllib2 as well,
>>> because the site will use a cookie to flag that you saw the consent
>>> form. This tutorial shows how to enable cookies and submit form data:
>>> http://personalpages.tds.net/~kent37/kk/00010.html
>>
>> I have seen the login examples where one provides values for the
>> fields username and password (thanks Kent). Given the form above,
>> however, it's unclear to me how one POSTs the form data when you
>> aren't actually passing any parameters. Perhaps this is less of a
>> Python question and more an http question (which unfortunately I know
>> nothing about either).
>
> Yes, the parameters are listed in the form.
>
> If you don't have at least a basic understanding of HTTP and HTML you
> are going to have trouble with this project...
>
> Kent
>



-- 
morenotestoself.wordpress.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-07 Thread Kent Johnson

On Tue, Jul 7, 2009 at 1:20 PM, David Kim wrote:
> On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnson wrote:
>>
>> curl works because it ignores the redirect to the ToS page, and the
>> site is (astoundingly) dumb enough to serve the content with the
>> redirect. You could make urllib2 behave the same way by defining a 302
>> handler that does nothing.
>
> Many thanks for the redirect pointer! I also found
> http://diveintopython.org/http_web_services/redirects.html. Is the
> handler class on this page what you mean by a handler that does
> nothing? (It looks like it exposes the error code but still follows
> the redirect).

No, all of those examples are handling the redirect. The
SmartRedirectHandler just captures additional status. I think you need
something like this:
class IgnoreRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_301(self, req, fp, code, msg, headers):
return None

def http_error_302(self, req, fp, code, msg, headers):
return None

> I guess i'm still a little confused since, if the
> handler does nothing, won't I still go to the ToS page?

No, it is the action of the handler, responding to the redirect
request, that causes the ToS page to be fetched.

> For example, I ran the following code (found at
> http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect)

That is pretty similar to the DiP code...

> I suspect I am not understanding something basic about how urllib2
> deals with this redirect issue since it seems everything I try gives
> me the same ToS page.

Maybe you don't understand how redirect works in general...

>> Generally you have to post to the same url as the form, giving the
>> same data the form does. You can inspect the source of the form to
>> figure this out. In this case the form is
>>
>> 
>> > name="urltarget"/>
>> 
>> 
>> 
>> 
>> 
>>
>> You generally need to enable cookie support in urllib2 as well,
>> because the site will use a cookie to flag that you saw the consent
>> form. This tutorial shows how to enable cookies and submit form data:
>> http://personalpages.tds.net/~kent37/kk/00010.html
>
> I have seen the login examples where one provides values for the
> fields username and password (thanks Kent). Given the form above,
> however, it's unclear to me how one POSTs the form data when you
> aren't actually passing any parameters. Perhaps this is less of a
> Python question and more an http question (which unfortunately I know
> nothing about either).

Yes, the parameters are listed in the form.

If you don't have at least a basic understanding of HTTP and HTML you
are going to have trouble with this project...

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-07 Thread Sander Sweers

2009/7/7 David Kim :
> opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
> urllib2.install_opener(opener)
>
> response = 
> urllib2.urlopen("http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1";)
> print response.read()
> 
>
> I suspect I am not understanding something basic about how urllib2
> deals with this redirect issue since it seems everything I try gives
> me the same ToS page.

Indeed, you create the opener but then you do not use it. Try the
below and it should work.
  response = 
opener.open("http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1";)
  data = response.read()

Greets
Sander
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-07 Thread David Kim

On Tue, Jul 7, 2009 at 7:26 AM, Kent Johnson wrote:
>
> curl works because it ignores the redirect to the ToS page, and the
> site is (astoundingly) dumb enough to serve the content with the
> redirect. You could make urllib2 behave the same way by defining a 302
> handler that does nothing.

Many thanks for the redirect pointer! I also found
http://diveintopython.org/http_web_services/redirects.html. Is the
handler class on this page what you mean by a handler that does
nothing? (It looks like it exposes the error code but still follows
the redirect). I guess i'm still a little confused since, if the
handler does nothing, won't I still go to the ToS page?

For example, I ran the following code (found at
http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect)
and ended-up pulling the same ToS page anyway.

import urllib2

redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
return urllib2.HTTPRedirectHandler.http_error_302(self, req,
fp, code, msg, headers)

http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response = 
urllib2.urlopen("http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1";)
print response.read()

I suspect I am not understanding something basic about how urllib2
deals with this redirect issue since it seems everything I try gives
me the same ToS page.

> Generally you have to post to the same url as the form, giving the
> same data the form does. You can inspect the source of the form to
> figure this out. In this case the form is
>
> 
>  name="urltarget"/>
> 
> 
> 
> 
> 
>
> You generally need to enable cookie support in urllib2 as well,
> because the site will use a cookie to flag that you saw the consent
> form. This tutorial shows how to enable cookies and submit form data:
> http://personalpages.tds.net/~kent37/kk/00010.html

I have seen the login examples where one provides values for the
fields username and password (thanks Kent). Given the form above,
however, it's unclear to me how one POSTs the form data when you
aren't actually passing any parameters. Perhaps this is less of a
Python question and more an http question (which unfortunately I know
nothing about either).

Thanks so much again for the help!

DK

--
morenotestoself.wordpress.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-07 Thread Kent Johnson

On Mon, Jul 6, 2009 at 5:54 PM, David Kim wrote:
> Hello all,
>
> I have two questions I'm hoping someone will have the patience to
> answer as an act of mercy.
>
> I. How to get past a Terms of Service page?
>
> I've just started learning python (have never done any programming
> prior) and am trying to figure out how to open or download a website
> to scrape data. The only problem is, whenever I try to open the link
> (via urllib2, for example) I'm after, I end up getting the HTML to a
> Terms of Service Page (where one has to click an "I Agree" button)
> rather than the actual target page.
>
> I've seen examples on the web on providing data for forms (typically
> by finding the name of the form and providing some sort of dictionary
> to fill in the form fields), but this simple act of getting past "I
> Agree" is stumping me. Can anyone save my sanity? As a workaround,
> I've been using os.popen('curl ' + url ' >' filename) to save the html
> in a txt file for later processing. I have no idea why curl works and
> urllib2, for example, doesn't (I use OS X).

curl works because it ignores the redirect to the ToS page, and the
site is (astoundingly) dumb enough to serve the content with the
redirect. You could make urllib2 behave the same way by defining a 302
handler that does nothing.

> I even tried to use Yahoo
> Pipes to try and sidestep coding anything altogether, but ended up
> looking at the same Terms of Service page anyway.
>
> Here's the code (tho it's probably not that illuminating since it's
> basically just opening a url):
>
> import urllib2
> url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1'
> #the first of 23 tables
> html = urllib2.urlopen(url).read()

Generally you have to post to the same url as the form, giving the
same data the form does. You can inspect the source of the form to
figure this out. In this case the form is








You generally need to enable cookie support in urllib2 as well,
because the site will use a cookie to flag that you saw the consent
form. This tutorial shows how to enable cookies and submit form data:
http://personalpages.tds.net/~kent37/kk/00010.html

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-06 Thread Stefan Behnel

Hi,

David Kim wrote:
> I have two questions I'm hoping someone will have the patience to
> answer as an act of mercy.
> 
> I. How to get past a Terms of Service page?
> 
> I've just started learning python (have never done any programming
> prior) and am trying to figure out how to open or download a website
> to scrape data. The only problem is, whenever I try to open the link
> (via urllib2, for example) I'm after, I end up getting the HTML to a
> Terms of Service Page (where one has to click an "I Agree" button)
> rather than the actual target page.

One comment to make here is that you should first read that page and check
if the provider of the service actually allows you to automatically
download content, or to use the service in the way you want. This is
totally up to them, and if their terms of service state that you must not
do that, well, then you must not do that.

Once you know that it's permitted, you can read the ToS page and search for
the form that the "Agree" button triggers. The URL given there is the one
you have to read next, but augmented with the parameter ("?xyz=...") that
the button sends.

> I've seen examples on the web on providing data for forms (typically
> by finding the name of the form and providing some sort of dictionary
> to fill in the form fields), but this simple act of getting past "I
> Agree" is stumping me. Can anyone save my sanity? As a workaround,
> I've been using os.popen('curl ' + url ' >' filename) to save the html
> in a txt file for later processing. I have no idea why curl works and
> urllib2, for example, doesn't (I use OS X).

There may be different reasons for that. One is that web servers often
present different content based on the client identifier. So if you see one
page with one client, and another page with a different client, that may be
the reason.

> Here's the code (tho it's probably not that illuminating since it's
> basically just opening a url):
> 
> import urllib2
> url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1'
> #the first of 23 tables
> html = urllib2.urlopen(url).read()

Hmmm, if what you want is to read a stock ticker or something like that,
you should *really* read their ToS first and make sure they do not disallow
automated access. Because it's actually quite likely that they do.

> II. How to parse html tables with lxml, beautifulsoup? (for dummies)
> 
> Assuming i get past the Terms of Service, I'm a bit overwhelmed by the
> need to know XPath, CSS, XML, DOM, etc. to scrape data from the web.

Using CSS selectors (lxml.cssselect) is not at all hard. You basically
express the page structure in a *very* short and straight forward way.

Searching the web for a CSS selectors tutorial should give you a few hits.

> The basic tutorials show something like the following:
> 
> from lxml import html
> doc = html.parse("/path/to/test.txt") #the file i downloaded via curl

... or read from the standard output pipe of curl. Note that there is a
stdlib module called "subprocess", which may make running curl easier.

Once you've determined the final URL to parse, you can also push it right
into lxml's parse() function, instead of going through urllib2 or an
external tool. Example:

url = "http://pypi.python.org/pypi?%3Aaction=search&term=lxml";
doc = html.parse(url)

> root = doc.getroot() #what is this root business?

The root (or top-most) node of the document you just parsed. Usually an
"html" tag in HTML pages.

> tables = root.cssselect('table')

Simple, isn't it? :)

BTW, did you look at this?

http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/

> I understand that selecting all the table tags will somehow target
> however many tables on the page. The problem is the table has multiple
> headers, empty cells, etc. Most of the examples on the web have to do
> with scraping the web for search results or something that don't
> really depend on the table format for anything other than layout.

That's because in cases like yours, you have to do most of the work
yourself anyway. No page is like the other, so you have to find your way
through the structure and figure out fixed points that allow you to get to
the data.

Stefan

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-06 Thread David Kim

Hello all,

I have two questions I'm hoping someone will have the patience to
answer as an act of mercy.

I. How to get past a Terms of Service page?

I've just started learning python (have never done any programming
prior) and am trying to figure out how to open or download a website
to scrape data. The only problem is, whenever I try to open the link
(via urllib2, for example) I'm after, I end up getting the HTML to a
Terms of Service Page (where one has to click an "I Agree" button)
rather than the actual target page.

I've seen examples on the web on providing data for forms (typically
by finding the name of the form and providing some sort of dictionary
to fill in the form fields), but this simple act of getting past "I
Agree" is stumping me. Can anyone save my sanity? As a workaround,
I've been using os.popen('curl ' + url ' >' filename) to save the html
in a txt file for later processing. I have no idea why curl works and
urllib2, for example, doesn't (I use OS X). I even tried to use Yahoo
Pipes to try and sidestep coding anything altogether, but ended up
looking at the same Terms of Service page anyway.

Here's the code (tho it's probably not that illuminating since it's
basically just opening a url):

import urllib2
url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1'
#the first of 23 tables
html = urllib2.urlopen(url).read()

II. How to parse html tables with lxml, beautifulsoup? (for dummies)

Assuming i get past the Terms of Service, I'm a bit overwhelmed by the
need to know XPath, CSS, XML, DOM, etc. to scrape data from the web.
I've tried looking at the documentation included with different python
libraries, but just got more confused.

The basic tutorials show something like the following:

from lxml import html
doc = html.parse("/path/to/test.txt") #the file i downloaded via curl
root = doc.getroot() #what is this root business?
tables = root.cssselect('table')

I understand that selecting all the table tags will somehow target
however many tables on the page. The problem is the table has multiple
headers, empty cells, etc. Most of the examples on the web have to do
with scraping the web for search results or something that don't
really depend on the table format for anything other than layout. Are
there any resources out there that are appropriate for web/python
illiterati like myself that deal with structured data as in the url
above?

FYI, the data in the url above goes up in smoke every week, so I'm
trying to capture it automatically on a weekly basis. Getting all of
it into a CSV or database would be a personal cause for celebration as
it would be the first really useful thing I've done with python since
starting to learn it a few months ago.

For anyone who is interested, here is the code that uses "curl" to
pull the webpages. It basically just builds the url string for the
different table-pages and saves down the file with a timestamped
filename:

import os
from time import strftime

BASE_URL = 'http://www.dtcc.com/products/derivserv/data_table_'
SECTIONS = {'section1':{'select':'i.php?id=table', 'id':range(1,9)},
'section2':{'select':'ii.php?id=table', 'id':range(9,17)},
'section3':{'select':'iii.php?id=table', 'id':range(17,24)}
}

def get_pages():

filenames = []
path = '~/Dev/Data/DTCC_DerivServ/'
#os.popen('cd ' + path)

for section in SECTIONS:
for id in SECTIONS[section]['id']:
#urlList.append(BASE_URL + SECTIONS[section]['select']+str(id))
url = BASE_URL + SECTIONS[section]['select'] + str(id)
timestamp = strftime('%Y%m%d_')
#sectionName = BASE_URL.split('/')[-1]
sectionNumber = SECTIONS[section]['select'].split('.')[0]
tableNumber = str(id) + '_'
filename = timestamp + tableNumber + sectionNumber + '.txt'
os.popen('curl ' + url + '> ' + path + filename)
filenames.append(filename)

return filenames

if (__name__ == '__main__'):
get_pages()


--
morenotestoself.wordpress.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-17 Thread Norman Khine

it is my error, the data is a sha string and it is not possible to get 
the string back, unless you use rainbowtables or something of the sort.


Kent Johnson wrote:

On Mon, Feb 16, 2009 at 8:12 AM, Norman Khine  wrote:

Hello,
Can someone point me in the right direction. I would like to return the
string for the following:

Type "help", "copyright", "credits" or "license" for more information.

import base64, urllib
data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'
data = urllib.unquote(data)
print base64.decodestring(data)

???Ը???Nzv̊??z?+?
What am I missing?


How is data created? Since it doesn't decode as you expect, either it
isn't base64 or there is some other processing needed. Do you have an
example of a data string where you know the desired decoded value?

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-17 Thread Kent Johnson

On Mon, Feb 16, 2009 at 8:12 AM, Norman Khine  wrote:
> Hello,
> Can someone point me in the right direction. I would like to return the
> string for the following:
>
> Type "help", "copyright", "credits" or "license" for more information.
 import base64, urllib
 data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'
 data = urllib.unquote(data)
 print base64.decodestring(data)
> ???Ը???Nzv̊??z?+?

>
> What am I missing?

How is data created? Since it doesn't decode as you expect, either it
isn't base64 or there is some other processing needed. Do you have an
example of a data string where you know the desired decoded value?

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-17 Thread Senthil Kumaran

On Tue, Feb 17, 2009 at 1:24 PM, Norman Khine  wrote:
> Thank you, but is it possible to get the original string from this?

What do you mean by the original string Norman?
Look at these definitions:

Quoted String:

In the different parts of the URL, there are set of characters, for
e.g. space character in path, that must be quoted, which means
converted to a different form so that url is understood by the
program.
So ' ' is quoted to %20.

Unquoted String:

When %20 comes in the URL, humans need it unquoted so that we can understand it.


What do you mean by original string?
Why are you doing base64 encoding?
And what are you trying to achieve?

Perhaps these can help us to help you better?



-- 
-- 
Senthil
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-17 Thread Sander Sweers

On Tue, Feb 17, 2009 at 08:54, Norman Khine  wrote:
> Thank you, but is it possible to get the original string from this?

You mean something like this?

>>> urllib.quote('hL/FGNS40fjoTnp2zIqq73reK60=\n')
'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'

Greets
Sander
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-16 Thread Norman Khine


Thank you, but is it possible to get the original string from this?

Sander Sweers wrote:

On Mon, Feb 16, 2009 at 14:12, Norman Khine  wrote:

Type "help", "copyright", "credits" or "license" for more information.

import base64, urllib
data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'
data = urllib.unquote(data)
print base64.decodestring(data)

???Ը???Nzv̊??z?+?
What am I missing?


Not an expert here but I think you can skip the last step...


urllib.unquote('hL/FGNS40fjoTnp2zIqq73reK60%3D%0A')

'hL/FGNS40fjoTnp2zIqq73reK60=\n'


Greets
Sander


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib unquote

2009-02-16 Thread Sander Sweers

On Mon, Feb 16, 2009 at 14:12, Norman Khine  wrote:
> Type "help", "copyright", "credits" or "license" for more information.
 import base64, urllib
 data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'
 data = urllib.unquote(data)
 print base64.decodestring(data)
> ???Ը???Nzv̊??z?+?

>
> What am I missing?

Not an expert here but I think you can skip the last step...

>>> urllib.unquote('hL/FGNS40fjoTnp2zIqq73reK60%3D%0A')
'hL/FGNS40fjoTnp2zIqq73reK60=\n'


Greets
Sander
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib unquote

2009-02-16 Thread Norman Khine


Hello,
Can someone point me in the right direction. I would like to return the 
string for the following:


Type "help", "copyright", "credits" or "license" for more information.
>>> import base64, urllib
>>> data = 'hL/FGNS40fjoTnp2zIqq73reK60%3D%0A'
>>> data = urllib.unquote(data)
>>> print base64.decodestring(data)
???Ը???Nzv̊??z?+?
>>>

What am I missing?

Cheers
Norman


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] URLLIB / GLOB

2007-10-22 Thread Kent Johnson

John wrote:
> Hello,
>  
> I would like to write a program which looks in a web directory for, say 
> *.gif files. Then processes those files in some manner. What I need is 
> something like glob which will return a directory listing of all the 
> files matching the search pattern (or just a simply a certain extension).
>  
> Is there a way to do this with urllib? Any other suggestions?

If the directory is only available as a web page you will have to fetch 
the web directory listing itself with urllib or urllib2 and parse the 
HTML returned to get the list of files. You might want to use 
BeautifulSoup to parse the HTML.
http://www.crummy.com/software/BeautifulSoup/

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] URLLIB / GLOB

2007-10-22 Thread John

Hello,

I would like to write a program which looks in a web directory for, say
*.gif files. Then processes those files in some manner. What I need is
something like glob which will return a directory listing of all the files
matching the search pattern (or just a simply a certain extension).

Is there a way to do this with urllib? Any other suggestions?

Thanks!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib

2006-09-17 Thread Patricia

Hi again,

I was able to use urllib2_file, which is a wrapper to urllib2.urlopen(). It
seems to work fine, and I'm able to retrieve the contents of the file using:
 
afile = req.form.list[1].file.read()

Now I have to store this text file (which is about 500k) and an id number into a
mysql database in a web server. I have a table that has two columns user id
(int) and mediumblob. The problem I have now is I don't know how to store them
into the database. I've been looking for examples without any luck. I tried
using load data infile, but it seems that I would need to have this client_side
file stored in the server. I  used load data local infile, and got some errors.
I also thought about storing them like this:

afile = req.form.list[1].file.read()
cursor.execute("""insert into p_report (sales_order, file_cont )
values (%s, %s)""", (1, afile))

I really don't know which is the best way to do it. Which is the right approach?
I'm really hoping someone can give me an idea how to do it because I'm finding
this a frustrating.

Thanks,
Patricia




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib

2006-09-12 Thread N

Hi,     You can try this:     import httplib, urllib  params = urllib.urlencode({'ID':'1','Name':'name', 'Eid':'we[at]you.com'})  #Assumed URL: test.com/cgi-bin/myform  h = httplib.HTTP("test.com")h.putrequest("POST", "/cgi-bin/myform")h.putheader("Content-length", "%d" % len(params))h.putheader('Accept', 'text/plain')h.putheader('Host', 'test.com')h.endheaders()h.send(params)  reply, msg, hdrs = h.getreply()  print reply # should be 200 in case of reply  data = "" # get the raw HTML  f = open('test.html','w') # put response in the html form  f.write(data)  f.close()     Hope it will solve your problem.     Regards,  Nav  ---     We on Orkut.com!!!     The Revolutions (Comp Sci RnD)   www.orkut.com/Community.aspx?cmm=13263692-      Jobs by Employee Reference  www.orkut.com/Community.aspx?cmm=19517702        Kent Johnson <[EMAIL PROTECTED]> wrote:  Patricia wrote:> Hi,> > I have used urllib and urllib2 to post data like the following:> > dict = {}> dict['data'] = info> dict['system'] = aname> > data = "">> req = urllib2.Request(url)> > And to get the data, I emulated a web page with a submit button: > s = ""> s += "  "> s += ""> s += ""> s += ""> s += ""> > > I would like to know how to send a file. It's a text file that will be > gzipped before being posted. I'm using python version 2.2.3.There are some old examples hereAhttp://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/146306I think the modern way uses
 email.MIMEMultipart but I don't have an example handy.Kent___Tutor maillist - Tutor@python.orghttp://mail.python.org/mailman/listinfo/tutor 
		Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1¢/min.___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] urllib

2006-09-12 Thread Kent Johnson

Patricia wrote:
> Hi,
> 
> I have used urllib and urllib2 to post data like the following:
> 
> dict = {}
> dict['data'] = info
> dict['system'] = aname
> 
> data = urllib.urlencode(dict)
> req = urllib2.Request(url)
> 
> And to get the data, I emulated a web page with a submit button:   
> s = ""
> s += ""
> s += ""
> s += ""
> s += ""
> s += ""
> 
> 
> I would like to know how to send a file. It's a text file that will be 
> gzipped before being posted. I'm using python version 2.2.3.

There are some old examples hereA
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/146306

I think the modern way uses email.MIMEMultipart but I don't have an 
example handy.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] urllib

2006-09-11 Thread Patricia

Hi,

I have used urllib and urllib2 to post data like the following:

dict = {}
dict['data'] = info
dict['system'] = aname

data = urllib.urlencode(dict)
req = urllib2.Request(url)

And to get the data, I emulated a web page with a submit button:   
s = ""
s += ""
s += ""
s += ""
s += ""
s += ""


I would like to know how to send a file. It's a text file that will be 
gzipped before being posted. I'm using python version 2.2.3.


Thanks,
Patricia


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] URLLIB

2005-05-16 Thread Kent Johnson

Please post the code that gave you the error.

Kent

Servando Garcia wrote:
> I tired that and here is the error I am currently getting:
> 
> assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
> 
> I was trying this:
> 
>>> X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('So 
>>> meFileName')
> 
> 
> in hopes to solve the above error
> 
> 
> 
> On May 13, 2005, at 6:52 AM, Kent Johnson wrote:
> 
>> Servando Garcia wrote:
>>
>>> Hello list
>>> I am on challenge 5. I think I need to some how download a file. I  
>>> have been trying like so
>>> X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('So 
>>> meFileName')
>>
>>
>> urlopener() returns a file-like object - something that behaves like  
>> an open file. Try
>> x = urllib.urlopener(name)
>> data = x.read()
>>
>> Kent
>>
>>
> 
> 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] URLLIB

2005-05-13 Thread Kent Johnson

Servando Garcia wrote:
> Hello list
> I am on challenge 5. I think I need to some how download a file. I have 
> been trying like so
> 
> X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('SomeFileName')
>  

urlopener() returns a file-like object - something that behaves like an open 
file. Try
x = urllib.urlopener(name)
data = x.read()

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] URLLIB

2005-05-13 Thread Servando Garcia

Hello list
I am on challenge 5. I think I need to some how download  a file.  I have been trying like so

X=urllib.URLopener(name,proxies={'http':'URL').distutils.copy_file('SomeFileName')

but with no luck.
Servando Garcia
John 3:16
For GOD so loved the world..___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

49 matches

Mail list logo